Db2 Udb v8.2 SQL Cookbook
Db2 Udb v8.2 SQL Cookbook
2
SQL Cookbook
Graeme Birchall
3-Nov-2004
Graeme Birchall
Preface
Important!
If you didnt get this document directly from my website, you may have got an older edition.
The book gets changed all the time, so if you want the latest, go to the source. Also, the latest
edition is usually the best book to have, even if you are using an older version of DB2, as the
examples are often much better.
This Cookbook is for DB2 UDB for Windows, UNIX, LINX, OS/2, etc. It is not suitable for
DB2 for z/OS or DB2 for AS/400. The SQL in these two products is quite different.
Disclaimer & Copyright
DISCLAIMER: This document is a best effort on my part. However, I screw up all the time,
so it would be extremely unwise to trust the contents in its entirety. I certainly dont. And if
you do something silly based on what I say, life is tough.
COPYRIGHT: You can make as many copies of this book as you wish. And I encourage you
to give it to others. But you cannot sell it, nor charge for it (other than to recover reproduction
costs), nor claim the material as your own, nor replace my name with another. Secondary distribution for gain is not allowed. You are also encouraged to use the related class notes for
teaching. In this case, you can charge for your time and materials (and your expertise). But
you cannot charge any licensing fee, nor claim an exclusive right of use.
TRADEMARKS: Lots of words in this document, like "DB2", are registered trademarks of
the IBM Corporation. And lots of other words, like "Windows", are registered trademarks of
the Microsoft Corporation. Acrobat is a registered trademark of the Adobe Corporation.
Tools Used
This book was written on a Dell PC that came with oodles of RAM. All testing was done on
DB2 V8.2. Word for Windows was used to write the document. Adobe Acrobat was used to
make the PDF file.
Book Binding
This book looks best when printed on a doubled sided laser printer and then suitably bound.
To this end, I did some experiments a few years ago to figure out how to bind books cheaply
using commonly available materials. I came up with what I consider to be a very satisfactory
solution that is fully documented on page 379.
Author / Book
Preface
Graeme Birchall
Author Notes
Book History
This book originally began a series of notes for my own use. After a while, friends began to
ask for copies, and enemies started to steal it, so I decided to tidy everything up and give it
away. Over the years, new chapters have been added as DB2 has evolved, and I have found
new ways to solve problems. Hopefully, this process will continue for the foreseeable future.
Why Free
This book is free because I want people to use it. The more people that use it, and the more
that it helps them, then the more inclined I am to keep it up to date. For these reasons, if you
find this book to be useful, please share it with others.
This book is free, rather than formally published, because I want to deliver the best product
that I can. If I had a publisher, I would have the services of an editor and a graphic designer,
but I would not be able to get to market so quickly, and when a product changes as quickly as
DB2 does, timeliness is important. Also, giving it away means that I am under no pressure to
make the book marketable. I simply include whatever I think might be useful.
Other Free Documents
The following documents are also available for free from my web site:
SAMPLE SQL: The complete text of the SQL statements in this Cookbook are available
in an HTML file. Only the first and last few lines of the file have HTML tags, the rest is
raw text, so it can easily be cut and paste into other files.
CLASS OVERHEADS: Selected SQL examples from this book have been rewritten as
class overheads. This enables one to use this material to teach DB2 SQL to others. Use
this cookbook as the student notes.
OLDER EDITIONS: This book is rewritten, and usually much improved, with each new
version of DB2. Some of the older editions are available from my website. The others can
be emailed upon request. However, the latest edition is the best, so you should probably
use it, regardless of the version of DB2 that you have.
Answering Questions
As a rule, I do not answer technical questions because I need to have a life. But Im interested
in hearing about interesting SQL problems, and also about any bugs in this book. However
you may not get a prompt response, or any response. And if you are obviously an idiot, dont
be surprised if I point out (for free, remember) that you are idiot.
Graeme
Book Editions
Upload Dates
1996-05-08: First edition of the DB2 V2.1.1 SQL Cookbook was posted to my web site.
This version was is Postscript Print File format.
1998-02-26: The DB2 V2.1.1 SQL Cookbook was converted to an Adobe Acrobat file
and posted to my web site. Some minor cosmetic changes were made.
1998-08-19: First edition of DB2 UDB V5 SQL Cookbook posted. Every SQL statement
was checked for V5, and there were new chapters on OUTER JOIN and GROUP BY.
1998-10-25: About twenty minor typos and sundry cosmetic defects were fixed.
1998-12-03: IBM published two versions of the V5.2 upgrade. The initial edition, which
I had used, evidently had a lot of problems. It was replaced within a week with a more
complete upgrade. This book was based on the later upgrade.
1999-01-25: A chapter on Summary Tables (new in the Dec/98 fixpack) was added and
all the SQL was checked for changes.
1999-01-28: Some more SQL was added to the new chapter on Summary Tables.
1999-02-15: The section of stopping recursive SQL statements was completely rewritten,
and a new section was added on denormalizing hierarchical data structures.
1999-03-16: Some bright spark at IBM pointed out that my new and improved section on
stopping recursive SQL was all wrong. Damn. I undid everything.
1999-05-12: Minor editorial changes were made, and one new example (on getting multiple counts from one value) was added.
1999-09-16: DB2 V6.1 edition. All SQL was rechecked, and there were some minor additions - especially to summary tables, plus a chapter on "DB2 Dislikes".
1999-10-06: Some bugs fixed, plus new section on index usage in summary tables.
2000-04-12: Some typos fixed, and a couple of new SQL tricks were added.
2000-09-19: DB2 V7.1 edition. All SQL was rechecked. The new areas covered are:
OLAP functions (whole chapter), ISO functions, and identity columns.
Book Editions
Graeme Birchall
2001-04-11: Document new features in latest fixpack. Also add a new chapter on Identity Columns and completely rewrite sub-query chapter.
2001-10-24: DB2 V7.2 fixpack 4 edition. Tested all SQL and added more examples, plus
a new section on the aggregation function.
2002-08-20: DB2 V8.1 (beta) edition. A few new functions are added, plus there is a
new section on temporary tables. The Identity Column and Join chapters were completely
rewritten, and the Whine chapter was removed.
2003-01-02: DB2 V8.1 (post-Beta) edition. SQL rechecked. More examples added.
2003-07-11: New chapters added for temporary tables, compound SQL, and user defined
functions. New DML section also added. Halting recursion section changed to use userdefined function.
2003-12-31: Tidied up the SQL in the Recursion chapter, and added a section on the
merge statement. Completely rewrote the chapter on materialized query tables.
2004-02-04: Added select-from-DML section, and tidied up some code. Also managed
to waste three whole days due to bugs in Microsoft Word.
2004-07-23: Rewrote chapter of identity column and sequences. Made DML separate
chapter. Added chapters on protecting data and XML functions. Other minor changes.
2004-11-03: Upgraded to V8.2. Retested all SQL. Documented new SQL features. Some
major hacking done on the GROUP BY chapter. Did not add anything regarding the new
SQL Procedural Language in V8.2 due to lack of decent IBM documentation.
Software Whines
This book is written using Microsoft Word for Windows. Ive been using this word processor
for over ten years, and it has generally been a bunch of bug-ridden junk, but I do confess that
it has gotten a little better in recent years. However, I could have written more than twice as
much that was twice as good in half the time - if it werent for all of the bugs in Word.
Table of Contents
PREFACE ...............................................................................................................................3
AUTHOR NOTES .....................................................................................................................4
BOOK EDITIONS .....................................................................................................................5
TABLE OF CONTENTS .............................................................................................................7
QUICK FIND .........................................................................................................................13
Index of Concepts.................................................................................................................................... 13
SQL Components..................................................................................................................................... 18
DB2 Objects........................................................................................................................................................................................... 18
DB2 Data Types .................................................................................................................................................................................... 20
Date/Time Arithmetic ............................................................................................................................................................................. 21
DB2 Special Registers........................................................................................................................................................................... 23
Distinct Types ........................................................................................................................................................................................ 24
SELECT Statement ............................................................................................................................................................................... 25
FETCH FIRST Clause ........................................................................................................................................................................... 27
Correlation Name................................................................................................................................................................................... 28
Renaming Fields.................................................................................................................................................................................... 29
Working with Nulls ................................................................................................................................................................................. 29
Quotes and Double-quotes.................................................................................................................................................................... 30
CAST Expression..................................................................................................................................... 36
VALUES Clause........................................................................................................................................ 38
CASE Expression..................................................................................................................................... 40
Table of Contents
Graeme Birchall
Conclusion ............................................................................................................................................................................................. 80
Table of Contents
Graeme Birchall
Join Types...............................................................................................................................................210
Inner Join ............................................................................................................................................................................................. 210
Left Outer Join ..................................................................................................................................................................................... 211
Right Outer Join................................................................................................................................................................................... 213
Full Outer Joins.................................................................................................................................................................................... 214
Cartesian Product ................................................................................................................................................................................ 218
10
Sequences.............................................................................................................................................. 267
Getting the Sequence Value................................................................................................................................................................ 267
Multi-table Usage................................................................................................................................................................................. 269
Counting Deletes ................................................................................................................................................................................. 271
Identity Columns vs. Sequences - a Comparison................................................................................................................................ 271
Table of Contents
11
Graeme Birchall
12
Quick Find
This brief chapter is for those who want to find how to do something, but are not sure what
the task is called. Hopefully, this list will identify the concept.
Index of Concepts
Join Rows
To combine matching rows in multiple tables, use a join (see page 207).
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
EMP_JB
+--------+
|ID|JOB |
|--|-----|
|10|Sales|
|20|Clerk|
+--------+
SELECT
nm.id
,nm.name
,jb.job
FROM
emp_nm nm
,emp_jb jb
WHERE
nm.id = jb.id
ORDER BY 1;
ANSWER
================
ID NAME
JOB
-- ------- ----10 Sanders Sales
20 Pernal Clerk
To get all of the rows from one table, plus the matching rows from another table (if there are
any), use an outer join (see page 210).
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
EMP_JB
+--------+
|ID|JOB |
|--|-----|
|10|Sales|
|20|Clerk|
+--------+
SELECT
nm.id
,nm.name
,jb.job
FROM
emp_nm nm
LEFT OUTER JOIN
emp_jb jb
ON
nm.id = jb.id
ORDER BY nm.id;
ANSWER
================
ID NAME
JOB
-- ------- ----10 Sanders Sales
20 Pernal Clerk
50 Hanes
-
Use the COALESCE function (see page 118) to replace a null value (e.g. generated in an
outer join) with a non-null value.
Select Where No Match
To get the set of the matching rows from one table where something is true or false in another
table (e.g. no corresponding row), use a sub-query (see page 229).
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
EMP_JB
+--------+
|ID|JOB |
|--|-----|
|10|Sales|
|20|Clerk|
+--------+
SELECT
*
FROM
emp_nm nm
WHERE NOT EXISTS
(SELECT *
FROM
emp_jb jb
WHERE nm.id = jb.id)
ORDER BY id;
ANSWER
========
ID NAME
== =====
50 Hanes
Quick Find
13
Graeme Birchall
Append Rows
To add (append) one set of rows to another set of rows, use a union (see page 243).
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
EMP_JB
+--------+
|ID|JOB |
|--|-----|
|10|Sales|
|20|Clerk|
+--------+
SELECT
FROM
WHERE
UNION
SELECT
FROM
ORDER BY
*
emp_nm
name < S
ANSWER
=========
ID 2
-- -----10 Sales
20 Clerk
20 Pernal
50 Hanes
*
emp_jb
1,2;
To assign line numbers to SQL output, use the ROW_NUMBER function (see page 98).
EMP_JB
+--------+
|ID|JOB |
|--|-----|
|10|Sales|
|20|Clerk|
+--------+
SELECT
id
,job
,ROW_NUMBER() OVER(ORDER BY job) AS R
FROM
emp_jb
ORDER BY job;
ANSWER
==========
ID JOB
R
-- ----- 20 Clerk 1
10 Sales 2
The make each row inserted into a table automatically get a unique key value, use an identity
column, or a sequence, when creating the table (see page 259).
If-Then-Else Logic
To include if-then-else logical constructs in SQL stmts, use the CASE phrase (see page 40).
EMP_JB
+--------+
|ID|JOB |
|--|-----|
|10|Sales|
|20|Clerk|
+--------+
SELECT
FROM
id
,job
,CASE
WHEN job = Sales
THEN Fire
ELSE Demote
END AS STATUS
emp_jb;
ANSWER
===============
ID JOB
STATUS
-- ----- -----10 Sales Fire
20 Clerk Demote
To get all of the dependents of some object, regardless of the degree of separation from the
parent to the child, use recursion (see page 289).
FAMILY
+-----------+
|PARNT|CHILD|
|-----|-----|
|GrDad|Dad |
|Dad |Dghtr|
|Dghtr|GrSon|
|Dghtr|GrDtr|
+-----------+
ANSWER
=========
PERSN LVL
----- --Dad
1
Dghtr
2
GrSon
3
GrDtr
3
To convert a (potentially large) set of values in a string (character field) into separate rows
(e.g. one row per word), use recursion (see page 344).
14
Index of Concepts
INPUT DATA
=================
"Some silly text"
Recursive SQL
============>
ANSWER
===========
TEXT LINE#
----- ----Some
1
silly
2
text
3
To convert a (potentially large) set of values that are in multiple rows into a single combined
field, use recursion (see page 345).
INPUT DATA
===========
TEXT LINE#
----- ----Some
1
silly
2
text
3
Recursive SQL
============>
ANSWER
=================
"Some silly text"
To fetch the first "n" matching rows, use the FETCH FIRST notation (see page 27).
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
SELECT
*
FROM
emp_nm
ORDER BY id DESC
FETCH FIRST 2 ROWS ONLY;
ANSWER
=========
ID NAME
-- -----50 Hanes
20 Pernal
To the fetch the "n" through "n + m" rows, first use the ROW_NUMBER function to assign
output numbers, then put the result in a nested-table-expression, and then fetch the rows with
desired numbers (see page 99).
Fetch Uncommitted Data
To retrieve data that may have been changed by another user, but which they have yet to
commit, use the WITH UR (Uncommitted Read) notation.
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
SELECT
*
FROM
emp_nm
WHERE
name like S%
WITH UR;
ANSWER
==========
ID NAME
-- ------10 Sanders
Quick Find
15
Graeme Birchall
Use a column function (see page 81) to summarize the contents of a column.
EMP_NM
+----------+
|ID|NAME
|
|--|-------|
|10|Sanders|
|20|Pernal |
|50|Hanes |
+----------+
SELECT
FROM
AVG(id)
AS avg
,MAX(name) AS maxn
,COUNT(*) AS #rows
emp_nm;
ANSWER
=================
AVG MAXN
#ROWS
--- ------- ----26 Sanders
3
To obtain subtotals and grand-totals, use the ROLLUP or CUBE statements (see page 195).
SELECT
FROM
WHERE
AND
AND
GROUP
ORDER
job
,dept
,SUM(salary) AS sum_sal
,COUNT(*)
AS #emps
staff
dept
< 30
salary < 20000
job
< S
BY ROLLUP(job, dept)
BY job
,dept;
ANSWER
=======================
JOB
DEPT SUM_SAL #EMP
----- ---- -------- ---Clerk
15 24766.70
2
Clerk
20 27757.35
2
Clerk
- 52524.05
4
Mgr
10 19260.25
1
Mgr
20 18357.50
1
Mgr
- 37617.75
2
- 90141.80
6
When a table is created, various DB2 features can be used to ensure that the data entered in
the table is always correct:
Check constraints can be defined to limit the values that a column can have.
Default values (for a column) can be defined - to be used when no value is provided.
Identity columns (see page 259), can be defined to automatically generate unique numeric values (e.g. invoice numbers) for all of the rows in a table. Sequences can do the
same thing over multiple tables.
Referential integrity rules can created to enforce key relationships between tables.
Triggers can be defined to enforce more complex integrity rules, and also to do things
(e.g. populate an audit trail) whenever data is changed.
See the DB2 manuals for documentation or page 73 for more information about the above.
Hide Complex SQL
One can create a view (see page 18) to hide complex SQL that is run repetitively. Be warned
however that doing so can make it significantly harder to tune the SQL - because some of the
logic will be in the user code, and some in the view definition.
Summary Table
Some queries that use a GROUP BY can be made to run much faster by defining a summary
table (see page 247) that DB2 automatically maintains. Subsequently, when the user writes
the original GROUP BY against the source-data table, the optimizer substitutes with a much
simpler (and faster) query against the summary table.
16
Index of Concepts
Introduction to SQL
This chapter contains a basic introduction to DB2 UDB SQL. It also has numerous examples
illustrating how to use this language to answer particular business problems. However, it is
not meant to be a definitive guide to the language. Please refer to the relevant IBM manuals
for a more detailed description.
Syntax Diagram Conventions
This book uses railroad diagrams to describe the DB2 UDB SQL statements. The following
diagram shows the conventions used.
Start
Continue
,
Default
ALL
SELECT
an item
DISTINCT
*
Resume
Repeat
End
,
table name
FROM
view name
Mandatory
WHERE
Optional
expression
and / or
Statement Delimiter
DB2 SQL does not come with a designated statement delimiter (terminator), though a semicolon is often used. A semi-colon cannot be used when writing a compound SQL statement
(see page 63) because that character is used to terminate the various sub-components of the
statement.
In DB2BATCH one can set the statement delimiter using an intelligent comment:
--#SET
SELECT
--#SET
SELECT
DELIMITER
name FROM
DELIMITER
name FROM
!
staff WHERE id = 10!
;
staff WHERE id = 20;
Introduction to SQL
17
Graeme Birchall
SQL Components
DB2 Objects
DB2 is a relational database that supports a variety of object types. In this section we shall
overview those items which one can obtain data from using SQL.
Table
A table is an organized set of columns and rows. The number, type, and relative position, of
the various columns in the table is recorded in the DB2 catalogue. The number of rows in the
table will fluctuate as data is inserted and deleted.
The CREATE TABLE statement is used to define a table. The following example will define
the EMPLOYEE table, which is found in the DB2 sample database.
CREATE TABLE employee
(empno
CHARACTER (00006)
,firstnme VARCHAR
(00012)
,midinit
CHARACTER (00001)
,lastname VARCHAR
(00015)
,workdept CHARACTER (00003)
,phoneno
CHARACTER (00004)
,hiredate DATE
,job
CHARACTER (00008)
,edlevel
SMALLINT
,SEX
CHARACTER (00001)
,birthdate DATE
,salary
DECIMAL
(00009,02)
,bonus
DECIMAL
(00009,02)
,comm
DECIMAL
(00009,02)
)
DATA CAPTURE NONE;
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
NOT NULL
A view is another way to look at the data in one or more tables (or other views). For example,
a user of the following view will only see those rows (and certain columns) in the EMPLOYEE table where the salary of a particular employee is greater than or equal to the average salary for their particular department.
CREATE VIEW employee_view AS
SELECT
a.empno, a.firstnme, a.salary, a.workdept
FROM
employee a
WHERE
a.salary >=
(SELECT AVG(b.salary)
FROM
employee b
WHERE a.workdept = b.workdept);
18
SQL Components
SELECT
c1, c2, c3
FROM
silly
ORDER BY c1 aSC;
ANSWER
===========
C1 C2
C3
-- --- -11 AAA 22
12 BBB 33
13 CCC
-
Figure 19, SELECT from a view that has its own data
We can go one step further and define a view that begins with a single value that is then manipulated using SQL to make many other values. For example, the following view, when selected from, will return 10,000 rows. Note however that these rows are not stored anywhere in
the database - they are instead created on the fly when the view is queried.
CREATE VIEW test_data AS
WITH temp1 (num1) AS
(VALUES (1)
UNION ALL
SELECT num1 + 1
FROM
temp1
WHERE
num1 < 10000)
SELECT *
FROM
temp1;
An alias is an alternate name for a table or a view. Unlike a view, an alias can not contain any
processing logic. No authorization is required to use an alias other than that needed to access
to the underlying table or view.
CREATE ALIAS
COMMIT;
CREATE ALIAS
COMMIT;
CREATE ALIAS
COMMIT;
A nickname is the name that one provides to DB2 for either a remote table, or a non-relational
object that one wants to query as if it were a table.
CREATE NICKNAME emp FOR unixserver.production.employee;
Use of the optional TABLESAMPLE reference enables one to randomly select (sample) some
fraction of the rows in the underlying base table:
SELECT
FROM
*
staff TABLESAMPLE BERNOULLI(10);
Introduction to SQL
19
Graeme Birchall
BLOB, CLOB, and DBCLOB (i.e. binary and character long object values).
Below is a simple table definition that uses the above data types:
CREATE TABLE sales_record
(sales#
INTEGER
NOT NULL
GENERATED ALWAYS AS IDENTITY
(START
WITH 1
,INCREMENT BY 1
,NO MAXVALUE
,NO CYCLE)
,sale_ts
TIMESTAMP
NOT NULL
,num_items
SMALLINT
NOT NULL
,payment_type
CHAR(2)
NOT NULL
,sale_value
DECIMAL(12,2)
NOT NULL
,sales_tax
DECIMAL(12,2)
,employee#
INTEGER
NOT NULL
,CONSTRAINT sales1
CHECK(payment_type IN (CS,CR))
,CONSTRAINT sales2
CHECK(sale_value
> 0)
,CONSTRAINT sales3
CHECK(num_items
> 0)
,CONSTRAINT sales4
FOREIGN KEY(employee#)
REFERENCES staff(id)
ON DELETE RESTRICT
,PRIMARY KEY(sales#));
The sales# is automatically generated (see page 259 for details). It is also the primary key
of the table, and so must always be unique.
Both the sales-value and the num-items must be greater than zero.
The employee# must already exist in the staff table. Furthermore, once a row has been
inserted into this table, any attempt to delete the related row from the staff table will fail.
Default Lengths
20
SQL Components
The length has not been provided for either of the above columns. In this case, DB2 defaults
to CHAR(1) for the first column and DECIMAL(5,0) for the second column.
Data Type Usage
A DB2 data type is not just a place to hold data. It also defines what rules are applied when
the data in manipulated. For example, storing monetary data in a DB2 floating-point field is a
no-no, in part because the data-type is not precise, but also because a floating-point number is
not manipulated (e.g. during division) according to internationally accepted accounting rules.
Date/Time Arithmetic
Manipulating date/time values can sometimes give unexpected results. What follows is a brief
introduction to the subject. The basic rules are:
ITEM
FIXED
SIZE
=====
N
N
Y
Y
Y
Y
Y
It doesnt matter if one uses singular or plural. One can add "4 day" to a date.
Some months and years are longer than others. So when one adds "2 months" to a date
the result is determined, in part, by the date that you began with. More on this below.
One cannot combine labeled durations in parenthesis: "date - (1 day + 2 months)" will
fail. One should instead say: "date - 1 day - 2 months".
Adding too many hours, minutes or seconds to a time will cause it to wrap around. The
overflow will be lost.
Adding 24 hours to the time 00.00.00 will get 24.00.00. Adding 24 hours to any other
time will return the original value.
Introduction to SQL
21
Graeme Birchall
When a decimal value is used (e.g. 4.5 days) the fractional part is discarded. So to add (to
a timestamp value) 4.5 days, add 4 days and 12 hours.
FROM
WHERE
AND
sales_date
,sales_date
,sales_date
,sales_date
,sales_date
- 10
DAY
AS
+ -1
MONTH AS
+ 99
YEARS AS
+ 55
DAYS
- 22
MONTHS AS
,sales_date + (4+6) DAYS
AS
sales
sales_person = GOUNOT
sales_date
= 1995-12-31
d1
d2
d3
<=
<=
<=
<=
ANSWER
==========
1995-12-31
1995-12-21
1995-11-30
2094-12-31
d4
d5
<=
<=
1994-04-24
1996-01-10
FROM
WHERE
AND
sales_date
,sales_date +
2 MONTH AS
,sales_date +
3 MONTHS AS
,sales_date +
2 MONTH
+
1 MONTH AS
,sales_date + (2+1) MONTHS AS
sales
sales_person = GOUNOT
sales_date
= 1995-12-31;
d1
d2
<=
<=
<=
ANSWER
==========
1995-12-31
1996-02-29
1996-03-31
d3
d4
<=
<=
1996-03-29
1996-03-31
When one date/time value is subtracted from another date/time value the result is a date, time,
or timestamp duration. This decimal value expresses the difference thus:
DURATION-TYPE
=============
DATE
TIME
TIMESTAMP
FORMAT
=============
DECIMAL(8,0)
DECIMAL(6,0)
DECIMAL(20,6)
NUMBER-REPRESENTS
=====================
yyyymmdd
hhmmss
yyyymmddhhmmss.zzzzzz
USE-WITH-D-TYPE
===============
TIMESTAMP, DATE
TIMESTAMP, TIME
TIMESTAMP
empno
,hiredate
,birthdate
,hiredate - birthdate
FROM
employee
WHERE
workdept = D11
AND
lastname < L
ORDER BY empno;
ANSWER
====================================
EMPNO HIREDATE
BIRTHDATE
------ ---------- ---------- ------000150 1972-02-12 1947-05-17 240826.
000200 1966-03-03 1941-05-29 240905.
000210 1979-04-11 1953-02-23 260116.
22
SQL Components
SELECT
FROM
WHERE
hiredate
,hiredate - 12345678.
,hiredate - 1234 years
56 months
78 days
employee
empno = 000150;
<=
<=
ANSWER
==========
1972-02-12
0733-03-26
<=
0733-03-26
One date/time can be subtracted (only) from another valid date/time value. The result is a
date/time duration value. Figure 30 above has an example.
DB2 Special Registers
A special register is a DB2 variable that contains information about the state of the system.
The complete list follows:
SPECIAL REGISTER
===============================================
CURRENT CLIENT_ACCTNG
CURRENT CLIENT_APPLNAME
CURRENT CLIENT_USERID
CURRENT CLIENT_WRKSTNNAME
CURRENT DATE
CURRENT DBPARTITIONNUM
CURRENT DEFAULT TRANSFORM GROUP
CURRENT DEGREE
CURRENT EXPLAIN MODE
CURRENT EXPLAIN SNAPSHOT
CURRENT ISOLATION
CURRENT LOCK TIMEOUT
CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION
CURRENT PACKAGE PATH
CURRENT PATH
CURRENT QUERY OPTIMIZATION
CURRENT REFRESH AGE
CURRENT SCHEMA
CURRENT SERVER
CURRENT TIME
CURRENT TIMESTAMP
CURRENT TIMEZONE
CURRENT USER
SESSION_USER
SYSTEM_USER
USER
UPDATE
======
no
no
no
no
no
no
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
no
no
no
no
yes
no
yes
DATA-TYPE
=============
VARCHAR(255)
VARCHAR(255)
VARCHAR(255)
VARCHAR(255)
DATE
INTEGER
VARCHAR(18)
CHAR(5)
VARCHAR(254)
CHAR(8)
CHAR(2)
INTEGER
VARCHAR(254)
VARCHAR(4096)
VARCHAR(254)
INTEGER
DECIMAL(20,6)
VARCHAR(128)
VARCHAR(18)
TIME
TIMESTAMP
DECIMAL(6,0)
VARCHAR(128)
VARCHAR(128)
VARCHAR(128)
VARCHAR(128)
Some special registers can be referenced using an underscore instead of a blank in the
name - as in: CURRENT_DATE.
Some special registers can be updated using the SET command (see list above).
All special registers can be queried using the SET command. They can also be referenced
in ordinary SQL statements.
Those special registers that automatically change over time (e.g. current timestamp) are
always the same for the duration of a given SQL statement. So if one inserts a thousand
rows in a single insert, all will get the same current timestamp.
Introduction to SQL
23
Graeme Birchall
One can reference the current timestamp in an insert or update, to record in the target
table when the row was changed. To see the value assigned, query the DML statement.
See page 54 for details.
Refer to the DB2 SQL Reference Volume 1 for a detailed description of each register.
Distinct Types
A distinct data type is a field type that is derived from one of the base DB2 field types. It is
used when one wants to prevent users from combining two separate columns that should
never be manipulated together (e.g. adding US dollars to Japanese Yen).
One creates a distinct (data) type using the following syntax:
CREATE DISTINCT TYPE
type-name
source-type
WITH COMPARISONS
The creation of a distinct type, under the covers, results in the creation two implied functions
that can be used to convert data to and from the source type and the distinct type. Support for
the basic comparison operators (=, <>, <, <=, >, and >=) is also provided.
Below is a typical create and drop statement:
CREATE DISTINCT TYPE JAP_YEN AS DECIMAL(15,2) WITH COMPARISONS;
DROP
DISTINCT TYPE JAP_YEN;
NOT NULL
NOT NULL WITH DEFAULT
NOT NULL WITH DEFAULT
id
,usa_sales + eur_sales AS tot_sales
customer;
24
SQL Components
NOT NULL
NOT NULL WITH DEFAULT
NOT NULL WITH DEFAULT
id
,usa_sales + eur_sales AS tot_sales
customer;
id
,DECIMAL(usa_sales) +
DECIMAL(eur_sales) AS tot_sales
customer;
A SELECT statement is used to query the database. It has the following components, not all
of which need be used in any particular query:
SELECT clause. One of these is required, and it must return at least one item, be it a column, a literal, the result of a function, or something else. One must also access at least
one table, be that a true table, a temporary table, a view, or an alias.
WITH clause. This clause is optional. Use this phrase to include independent SELECT
statements that are subsequently accessed in a final SELECT (see page 280).
ORDER BY clause. Optionally, order the final output (see page 185).
FETCH FIRST clause. Optionally, stop the query after "n" rows (see page 27). If an optimize-for value is also provided, both values are used independently by the optimizer.
READ-ONLY clause. Optionally, state that the query is read-only. Some queries are inherently read-only, in which case this option has no effect.
FOR UPDATE clause. Optionally, state that the query will be used to update certain columns that are returned during fetch processing.
OPTIMIZE FOR n ROWS clause. Optionally, tell the optimizer to tune the query assuming that not all of the matching rows will be retrieved. If a first-fetch value is also provided, both values are used independently by the optimizer.
Introduction to SQL
25
Graeme Birchall
Refer to the IBM manuals for a complete description of all of the above. Some of the more
interesting options are described below.
SELECT statement
,
WITH
ORDER BY clause
READ-ONLY clause
FOR UPDATE clause
Every query must have at least one SELECT statement, and it must return at least one item,
and access at least one object.
SELECT
FROM
WHERE
,
an item
*
,
table name
view name
alias name
( full select )
correlation name
AS
expression
and /or
Literal: A literal value (e.g. "ABC"). Use the AS expression to name the literal.
FROM Objects
26
SQL Components
Sample SQL
SELECT
deptno
,admrdept
,ABC AS abc
FROM
department
WHERE
deptname LIKE %ING%
ORDER BY 1;
ANSWER
===================
DEPTNO ADMRDEPT ABC
------ -------- --B01
A00
ABC
D11
D01
ABC
*
department
deptname LIKE %ING%
1;
deptno
,department.*
FROM
department
WHERE
deptname LIKE %ING%
ORDER BY 1;
department.*
,department.*
FROM
department
WHERE
deptname LIKE %NING%
ORDER BY 1;
The fetch first clause limits the cursor to retrieving "n" rows. If the clause is specified and no
number is provided, the query will stop after the first fetch.
1
FETCH FIRST
integer
ROW
ROWS
ONLY
years
,name
,id
FROM
staff
FETCH FIRST 3 ROWS ONLY;
ANSWER
=====================
YEARS NAME
ID
------ --------- ---7 Sanders
10
8 Pernal
20
5 Marenghi
30
Figure 48, FETCH FIRST without ORDER BY, gets random rows
Introduction to SQL
27
Graeme Birchall
WARNING: Using the FETCH FIRST clause to get the first "n" rows can sometimes return
an answer that is not what the user really intended. See below for details.
If an ORDER BY is provided, then the FETCH FIRST clause can be used to stop the query
after a certain number of what are, perhaps, the most desirable rows have been returned.
However, the phrase should only be used in this manner when the related ORDER BY
uniquely identifies each row returned.
To illustrate what can go wrong, imagine that we wanted to query the STAFF table in order to
get the names of those three employees that have worked for the firm the longest - in order to
give them a little reward (or possibly to fire them). The following query could be run:
SELECT
years
,name
,id
FROM
staff
WHERE
years IS NOT NULL
ORDER BY years DESC
FETCH FIRST 3 ROWS ONLY;
ANSWER
=====================
YEARS NAME
ID
------ --------- ---13 Graham
310
12 Jones
260
10 Hanes
50
Figure 49, FETCH FIRST with ORDER BY, gets wrong answer
The above query answers the question correctly, but the question was wrong, and so the answer is wrong. The problem is that there are two employees that have worked for the firm for
ten years, but only one of them shows, and the one that does show was picked at random by
the query processor. This is almost certainly not what the business user intended.
The next query is similar to the previous, but now the ORDER ID uniquely identifies each
row returned (presumably as per the end-users instructions):
SELECT
years
,name
,id
FROM
staff
WHERE
years IS NOT NULL
ORDER BY years DESC
,id
DESC
FETCH FIRST 3 ROWS ONLY;
ANSWER
=====================
YEARS NAME
ID
------ --------- ---13 Graham
310
12 Jones
260
10 Quill
290
Figure 50, FETCH FIRST with ORDER BY, gets right answer
WARNING: Getting the first "n" rows from a query is actually quite a complicated problem. Refer to page 100 for a more complete discussion.
Correlation Name
The correlation name is defined in the FROM clause and relates to the preceding object name.
In some cases, it is used to provide a short form of the related object name. In other situations,
it is required in order to uniquely identify logical tables when a single physical table is referred to twice in the same query. Some sample SQL follows:
SELECT
FROM
WHERE
a.empno
,a.lastname
employee a
,(SELECT MAX(empno)AS empno
FROM
employee) AS b
a.empno = b.empno;
ANSWER
=================
EMPNO LASTNAME
------ ---------000340 GOUNOT
28
SQL Components
SELECT
FROM
WHERE
AND
AND
AND
AND
ORDER
a.empno
,a.lastname
,b.deptno AS dept
employee
a
,department b
a.workdept = b.deptno
a.job
<> SALESREP
b.deptname = OPERATIONS
a.sex
IN (M,F)
b.location IS NULL
BY 1;
ANSWER
======================
EMPNO LASTNAME
DEPT
------ ---------- ---000090 HENDERSON E11
000280 SCHNEIDER E11
000290 PARKER
E11
000300 SMITH
E11
000310 SETRIGHT
E11
The AS phrase can be used in a SELECT list to give a field a different name. If the new name
is an invalid field name (e.g. contains embedded blanks), then place the name in quotes:
SELECT
empno
AS e_num
,midinit AS "m int"
,phoneno AS "..."
FROM
employee
WHERE
empno < 000030
ORDER BY 1;
ANSWER
===================
E_NUM
M INT ...
------ ----- ---000010 I
3978
000020 L
3476
It is known outside of the full-select of nested table expressions, common table expressions, and in a view definition.
CREATE view emp2 AS
SELECT empno
AS e_num
,midinit AS "m int"
,phoneno AS "..."
FROM
employee;
SELECT *
FROM
emp2
WHERE "..." = 3978;
ANSWER
===================
E_NUM
M INT ...
------ ----- ---000010 I
3978
In SQL something can be true, false, or null. This three-way logic has to always be considered when accessing data. To illustrate, if we first select all the rows in the STAFF table
where the SALARY is < $10,000, then all the rows where the SALARY is >= $10,000, we
have not necessarily found all the rows in the table because we have yet to select those rows
where the SALARY is null.
The presence of null values in a table can also impact the various column functions. For example, the AVG function ignores null values when calculating the average of a set of rows.
This means that a user-calculated average may give a different result from a DB2 calculated
equivalent:
Introduction to SQL
29
Graeme Birchall
SELECT
FROM
WHERE
AVG(comm)
AS a1
,SUM(comm) / COUNT(*) AS a2
staff
id < 100;
ANSWER
===============
A1
A2
------- -----796.025 530.68
COUNT(*)
AS num
,MAX(lastname) AS max
employee
firstnme = FRED;
ANSWER
========
NUM MAX
--- --0 -
Figure 56, Getting a NULL value from a field defined NOT NULL
Why Nulls Exist
Null values can represent two kinds of data. In first case, the value is unknown (e.g. we do not
know the name of the persons spouse). Alternatively, the value is not relevant to the situation
(e.g. the person does not have a spouse).
Many people prefer not to have to bother with nulls, so they use instead a special value when
necessary (e.g. an unknown employee name is blank). This trick works OK with character
data, but it can lead to problems when used on numeric values (e.g. an unknown salary is set
to zero).
Locating Null Values
One can not use an equal predicate to locate those values that are null because a null value
does not actually equal anything, not even null, it is simply null. The IS NULL or IS NOT
NULL phrases are used instead. The following example gets the average commission of only
those rows that are not null. Note that the second result differs from the first due to rounding
loss.
SELECT
FROM
WHERE
AND
AVG(comm)
AS a1
,SUM(comm) / COUNT(*) AS a2
staff
id < 100
comm IS NOT NULL;
ANSWER
===============
A1
A2
------- -----796.025 796.02
To write a string, put it in quotes. If the string contains quotes, each quote is represented by a
pair of quotes:
SELECT
FROM
WHERE
JOHN
,JOHNS
,JOHNS
,"JOHNS"
staff
id = 10;
AS
AS
AS
AS
J1
J2
J3
J4
ANSWER
=============================
J1
J2
J3
J4
---- ------ -------- -------JOHN JOHNS JOHNS "JOHNS"
30
SQL Components
SELECT
id
AS
,dept
AS
,years
AS
,ABC
AS
,"
AS
FROM
staff s
WHERE
id < 40
ORDER BY "USER ID";
"USER ID"
"D#"
"#Y"
"TXT"
"""quote"" fld"
ANSWER
===============================
USER ID D# #Y TXT "quote" fld
------- -- -- ----- ----------10 20 7 ABC
"
20 20 8 ABC
"
30 38 5 ABC
"
SQL Predicates
A predicate is used in either the WHERE or HAVING clauses of a SQL statement. It specifies a condition that true, false, or unknown about a row or a group.
Basic Predicate
A basic predicate compares two values. If either value is null, the result is unknown. Otherwise the result is either true or false.
expresion
NOT
=
<>
<
>
<=
>=
expression
ANSWER
===============
ID
JOB
DEPT
--- ---- ---10 Mgr
20
30 Mgr
38
50 Mgr
15
140 Mgr
51
expression
,
)
expression
NOT
ANSWER
===========
ID DEPT JOB
-- ---- --30
38 Mgr
Introduction to SQL
31
Graeme Birchall
ANSWER
===========
ID DEPT JOB
-- ---- --30
38 Mgr
dept =
28)
years =
7)
job
= Mgr)
=
<>
<
>
<=
>=
NOT
,
(
expression
SOME
ANY
ALL
( fullselect )
SOME
ANY
id, job
staff
job = ANY (SELECT job FROM staff)
id <= ALL (SELECT id FROM staff)
id;
ANSWER
========
ID JOB
--- ---10 Mgr
ANSWER
==============
ID DEPT JOB
--- ---- ----20
20 Sales
BETWEEN
low val.
AND
high val.
NOT
ANSWER
=========
ID JOB
--- ----10 Mgr
20 Sales
30 Mgr
32
SQL Predicates
EXISTS Predicate
EXISTS
NOT
ANSWER
=========
ID JOB
--- ----10 Mgr
20 Sales
30 Mgr
40 Sales
IN Predicate
( fullselect )
IN
NOT
,
expression
(
expression
NOT
,
(
expression
IN
( fullselect )
NOT
id, job
staff a
id IN (10,20,30)
id IN (SELECT id
FROM
staff)
AND id NOT IN 99
ORDER BY id;
ANSWER
=========
ID JOB
--- ----10 Mgr
20 Sales
30 Mgr
empno, lastname
employee
(empno, AD3113) IN
(SELECT empno, projno
FROM
emp_act
WHERE emptime > 0.5)
ORDER BY 1;
ANSWER
===============
EMPNO LASTNAME
------ ------000260 JOHNSON
000270 PEREZ
Introduction to SQL
33
Graeme Birchall
LIKE Predicate
LIKE
NOT
pattern
ESCAPE
pattern
LIKE AB_D%
LIKE _X
LIKE %X
ANSWER
==============
ID
NAME
--- --------130 Yamaguchi
200 Scoutten
The escape character in a LIKE statement enables one to check for percent signs and/or underscores in the search string. When used, it precedes the % or _ in the search string indicating that it is the actual value and not the special character which is to be checked for.
When processing the LIKE pattern, DB2 works thus: Any pair of escape characters is treated
as the literal value (e.g. "++" means the string "+"). Any single occurrence of an escape character followed by either a "%" or a "_" means the literal "%" or "_" (e.g. "+%" means the
string "%"). Any other "%" or "_" is used as in a normal LIKE pattern.
LIKE STATEMENT TEXT
===========================
LIKE AB%
LIKE AB%
ESCAPE +
LIKE AB+%
ESCAPE +
LIKE AB++
ESCAPE +
LIKE AB+%%
ESCAPE +
LIKE AB++%
ESCAPE +
LIKE AB+++%
ESCAPE +
LIKE AB+++%%
ESCAPE +
LIKE AB+%+%%
ESCAPE +
LIKE AB++++
ESCAPE +
LIKE AB+++++% ESCAPE +
LIKE AB++++%
ESCAPE +
LIKE AB+%++%
ESCAPE +
id
staff
id = 10
ABC LIKE
A%C LIKE
A_C LIKE
A_$ LIKE
AB%
A/%C ESCAPE /
A\_C ESCAPE \
A$_$$ ESCAPE $;
ANSWER
======
ID
--10
34
SQL Predicates
NULL Predicate
The NULL predicate checks for null values. The result of this predicate cannot be unknown.
If the value of the expression is null, the result is true. If the value of the expression is not
null, the result is false.
exprsn.
IS
NULL
NOT
NOT
ANSWER
=========
ID
COMM
--- ---10 30 50 -
To refer to a special character in a predicate, or anywhere else in a SQL statement, use the
"X" notation to substitute with the ASCII hex value. For example, the following query will
list all names in the STAFF table that have an "a" followed by a semi-colon:
SELECT
id
,name
FROM
staff
WHERE
name LIKE %a || X3B || %
ORDER BY id;
Expressions within parentheses are done first, then prefix operators (e.g. -1), then multiplication and division, then addition and subtraction. When two operations of equal precedence are
together (e.g. 1 * 5 / 4) they are done from left to right.
Example:
555 +
^
5th
-22
^
2nd
(12 - 3) * 66
^
3rd
^
1st
^
4th
ANSWER
======
423
FROM
(12
,
-22 / (12
,
-22 / (12
,555 + -22 / (12
sysibm.sysdummy1;
3)
3)
3) * 66
3) * 66
AS
AS
AS
AS
int1
int2
int3
int4
ANSWER
===================
INT1 INT2 INT3 INT4
---- ---- ---- ---9
-2 -132 423
Introduction to SQL
35
Graeme Birchall
SELECT
FROM
(12.0
,
-22 / (12.0
,
-22 / (12.0
,555 + -22 / (12.0
sysibm.sysdummy1;
3)
3)
3) * 66
3) * 66
AS
AS
AS
AS
dec1
dec2
dec3
dec4
ANSWER
===========================
DEC1
DEC2
DEC3
DEC4
------ ------ ------ -----9.0
-2.4 -161.3 393.6
*
table1
col1 = C
col1 >= A
col2 >= AA
col1;
ANSWER>>
COL1 COL2
---- ---A
AA
B
BB
C
CC
SELECT
*
FROM
table1
WHERE
(col1 = C
AND
col1 >= A)
OR
col2 >= AA
ORDER BY col1;
ANSWER>>
COL1
---A
B
C
SELECT
*
FROM
table1
WHERE
col1 = C
AND
(col1 >= A
OR
col2 >= AA)
ORDER BY col1;
ANSWER>>
COL1 COL2
---- ---C
CC
COL2
---AA
BB
CC
TABLE1
+---------+
|COL1|COL2|
|----|----|
|A
|AA |
|B
|BB |
|C
|CC |
+---------+
CAST Expression
The CAST is expression is used to convert one data type to another. It is similar to the various
field-type functions (e.g. CHAR, SMALLINT) except that it can also handle null values and
host-variable parameter markers.
CAST (
expression
NULL
parameter maker
AS
data-type
EXPRESSION: If the input is neither null, nor a parameter marker, the input data-type is
converted to the output data-type. Truncation and/or padding with blanks occur as required. An error is generated if the conversion is illegal.
NULL: If the input is null, the output is a null value of the specified type.
36
CAST Expression
PARAMETER MAKER: This option is only used in programs and need not concern us
here. See the DB2 SQL Reference for details.
Examples
Use the CAST expression to convert the SALARY field from decimal to integer:
SELECT
id
,salary
,CAST(salary AS INTEGER) AS sal2
FROM
staff
WHERE
id < 30
ORDER BY id;
ANSWER
=================
ID SALARY
SAL2
-- -------- ----10 18357.50 18357
20 18171.25 18171
id
,job
,CAST(job AS CHAR(3)) AS job2
FROM
staff
WHERE
id < 30
ORDER BY id;
ANSWER
=============
ID JOB
JOB2
-- ----- ---10 Mgr
Mgr
20 Sales Sal
id
,CAST(NULL AS SMALLINT) AS junk
FROM
staff
WHERE
id < 30
ORDER BY id;
ANSWER
=======
ID JUNK
-- ---10
20
-
Figure 89, Use CAST expression to define SMALLINT field with null values
The CAST expression can also be used in a join, where the field types being matched differ:
SELECT
stf.id
,emp.empno
FROM
staff
stf
LEFT OUTER JOIN
employee emp
ON
stf.id
= CAST(emp.empno AS SMALLINT)
AND
emp.job = MANAGER
WHERE
stf.id
< 60
ORDER BY stf.id;
ANSWER
=========
ID EMPNO
-- -----10 20 000020
30 000030
40 50 000050
stf.id
,emp.empno
FROM
staff
stf
LEFT OUTER JOIN
employee emp
ON
stf.id
= SMALLINT(emp.empno)
AND
emp.job = MANAGER
WHERE
stf.id
< 60
ORDER BY stf.id;
ANSWER
=========
ID EMPNO
-- -----10 20 000020
30 000030
40 50 000050
Introduction to SQL
37
Graeme Birchall
VALUES Clause
The VALUES clause is used to define a set of rows and columns with explicit values. The
clause is commonly used in temporary tables, but can also be used in view definitions. Once
defined in a table or view, the output of the VALUES clause can be grouped by, joined to,
and otherwise used as if it is an ordinary table - except that it can not be updated.
,
expression
VALUES
,
(
,
expression
NULL
6
(6)
6, 7, 8
(6), (7), (8)
(6,66), (7,77), (8,NULL)
<=
<=
<=
<=
<=
1
1
1
3
3
row,
row,
row,
rows,
rows,
1
1
3
1
2
column
column
columns
column
columns
The next statement shall define a temporary table containing two columns and three rows.
The first column will default to type integer and the second to type varchar.
WITH temp1 (col1, col2) AS
(VALUES
(
0, AA)
,(
1, BB)
,(
2, NULL)
)
SELECT *
FROM
temp1;
ANSWER
=========
COL1 COL2
---- ---0 AA
1 BB
2 -
ANSWER
=========
COL1 COL2
---- ---0.0 AA
1.0 BB
2.0 -
38
VALUES Clause
ANSWER
=========
COL1 COL2
---- ---0 A
1 B
2 -
ANSWER
=========
COL1 COL2
---- ---0 A
1 B
2 -
Temporary tables, or (permanent) views, defined using the VALUES expression can be used
much like a DB2 table. They can be joined, unioned, and selected from. They can not, however, be updated, or have indexes defined on them. Temporary tables can not be used in a
sub-query.
WITH temp1 (col1, col2,
(VALUES
(
0, AA,
,(
1, BB,
,(
2, CC,
)
,temp2 (col1b, colx) AS
(SELECT col1
,col1 + col3
FROM
temp1
)
SELECT *
FROM
temp2;
col3) AS
0.00)
1.11)
2.22)
ANSWER
==========
COL1B COLX
----- ---0 0.00
1 2.11
2 4.22
ANSWER
======
COL1
---0
1
2
3
etc
Figure 100, Use VALUES defined data to seed a recursive SQL statement
All of the above examples have matched a VALUES statement up with a prior WITH expression, so as to name the generated columns. One doesnt have to use the latter, but if you dont,
you get a table with unnamed columns, which is pretty useless:
Introduction to SQL
39
Graeme Birchall
SELECT
FROM
*
(VALUES (123,ABC)
,(234,DEF)
)AS ttt
ORDER BY 1 DESC;
ANSWER
======
--- --234 DEF
123 ABC
CASE Expression
WARNING: The sequence of the CASE conditions can affect the answer. The first WHEN
check that matches is the one used.
CASE expressions enable one to do if-then-else type processing inside of SQL statements.
There are two general flavors of the expression. In the first kind, each WHEN statement does
its own independent checking. In the second kind, all of the WHEN conditions are used to do
"equal" checks against a common reference expression. With both flavors, the first WHEN
that matches is the one chosen.
WHEN
search-condition
THEN
CASE
expression
WHEN
ELSE NULL
ELSE
expression
THEN
result
NULL
result
NULL
END
result
If more than one WHEN condition is true, the first one processed that matches is used.
If no WHEN matches, the value in the ELSE clause applies. If no WHEN matches and
there is no ELSE clause, the result is NULL.
There must be at least one non-null result in a CASE statement. Failing that, one of the
NULL results must be inside of a CAST expression.
Functions that have an external action (e.g. RAND) can not be used in the expression part
of a CASE statement.
CASE Flavours
The following CASE is of the kind where each WHEN does an equal check against a common expression - in this example, the current value of SEX.
40
CASE Expression
SELECT
Lastname
,sex
AS sx
,CASE sex
WHEN F THEN FEMALE
WHEN M THEN MALE
ELSE NULL
END AS sexx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
====================
LASTNAME
SX SEXX
---------- -- -----JEFFERSON M MALE
JOHNSON
F FEMALE
JONES
M MALE
lastname
,sex
AS sx
,CASE
WHEN sex = F THEN FEMALE
WHEN sex = M THEN MALE
ELSE NULL
END AS sexx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
====================
LASTNAME
SX SEXX
---------- -- -----JEFFERSON M MALE
JOHNSON
F FEMALE
JONES
M MALE
SELECT
lastname
,midinit AS mi
,sex
AS sx
,CASE
WHEN midinit > SEX
THEN midinit
ELSE sex
END AS mx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
===================
LASTNAME
MI SX MX
---------- -- -- -JEFFERSON J M M
JOHNSON
P F P
JONES
T M T
COUNT(*)
,SUM(CASE sex WHEN F THEN 1 ELSE 0
,SUM(CASE sex WHEN M THEN 1 ELSE 0
employee
lastname LIKE J%;
AS tot
END) AS #f
END) AS #m
ANSWER
=========
TOT #F #M
--- -- -3 1 2
lastname
,sex
FROM
employee
WHERE
lastname LIKE J%
AND
CASE sex
WHEN F THEN
WHEN M THEN
ELSE NULL
END IS NOT NULL
ORDER BY 1;
ANSWER
==============
LASTNAME
SEX
---------- --JEFFERSON M
JOHNSON
F
JONES
M
Introduction to SQL
41
Graeme Birchall
SELECT
lastname
,LENGTH(RTRIM(lastname)) AS len
,SUBSTR(lastname,1,
CASE
WHEN LENGTH(RTRIM(lastname))
> 6 THEN 6
ELSE LENGTH(RTRIM(lastname))
END ) AS lastnm
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
=====================
LASTNAME
LEN LASTNM
---------- --- -----JEFFERSON
9 JEFFER
JOHNSON
7 JOHNSO
JONES
5 JONES
1.1
1.2
5 THEN comm * 1.3
5 THEN comm * 1.4
ANSWER
========
C1 C2 C3
-- -- -88 9 9
44 3 14
22 0 0 1 0
ANSWER
============
NAME
DUMB
------- ---Sanders
Pernal
-
The case WHEN checks are always processed in the order that they are found. The first one
that matches is the one used. This means that the answer returned by the query can be affected
by the sequence on the WHEN checks. To illustrate this, the next statement uses the SEX
field (which is always either "F" or "M") to create a new field called SXX. In this particular
example, the SQL works as intended.
42
CASE Expression
SELECT
lastname
,sex
,CASE
WHEN sex >= M THEN MAL
WHEN sex >= F THEN FEM
END AS sxx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
=================
LASTNAME
SX SXX
---------- -- --JEFFERSON M MAL
JOHNSON
F FEM
JONES
M MAL
lastname
,sex
,CASE
WHEN sex >= F THEN FEM
WHEN sex >= M THEN MAL
END AS sxx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
=================
LASTNAME
SX SXX
---------- -- --JEFFERSON M FEM
JOHNSON
F FEM
JONES
M FEM
Introduction to SQL
43
Graeme Birchall
44
CASE Expression
A special kind of SELECT statement (see page 54) can encompass an INSERT, UPDATE, or
DELETE statement to get the before or after image of whatever rows were changed (e.g. select the list of rows deleted). This kind of SELECT can be very useful when the DML statement is internally generating a value that one needs to know (e.g. an INSERT automatically
creates a new invoice number using a sequence column).
Insert
The INSERT statement is used to insert rows into a table, view, or full-select. To illustrate
how it is used, this section will use the EMP_ACT sample table, which is defined thus:
CREATE TABLE emp_act
(empno
CHARACTER
,projno
CHARACTER
,actno
SMALLINT
,emptime
DECIMAL
,emstdate
DATE
,emendate
DATE);
(00006)
(00006)
NOT NULL
NOT NULL
NOT NULL
(05,02)
INSERT INTO
table-name
,
view-name
(
(full-select)
column-name
,
INCLUDE
VALUES
column-name
data-type
,
expression
)
full-select
WITH
common-table-expression
One can insert into a table, view, nickname, or SQL expression. For views and SQL expressions, the following rules apply:
The list of columns selected cannot include a column function (e.g. MIN).
The list of columns selected must include all those needed to insert a new row.
The list of columns selected cannot include one defined from a constant, expression, or a
scalar function.
45
Graeme Birchall
Sub-queries, and other predicates, are fine, but are ignored (see figure 120).
A "union all" is permitted - as long as the underlying tables on either side of the union
have check constraints such that a row being inserted is valid for one, and only one, of
the tables in the union.
All bets are off if the insert is going to a table that has an INSTEAD OF trigger defined.
Usage Notes
One has to provide a list of the columns (to be inserted) if the set of values provided does
not equal the complete set of columns in the target table, or are not in the same order as
the columns are defined in the target table.
The columns in the INCLUDE list are not inserted. They are intended to be referenced in
a SELECT statement that encompasses the INSERT (see page 54).
The input data can either be explicitly defined using the VALUES statement, or retrieved
from some other table using a full-select.
Direct Insert
To insert a single row, where all of the columns are populated, one lists the input the values in
the same order as the columns are defined in the table:
INSERT INTO emp_act VALUES
(100000 ,ABC ,10 ,1.4 ,2003-10-22, 2003-11-24);
VALUES
,10 ,1.4 ,2003-10-22, 2003-11-24)
,10 ,1.4 ,2003-10-22, 2003-11-24)
,10 ,1.4 ,2003-10-22, 2003-11-24);
The NULL and DEFAULT keywords can be used to assign these values to columns. One can
also refer to special registers, like the current date and current time:
INSERT INTO emp_act VALUES
(400000 ,ABC ,10 ,NULL ,DEFAULT, CURRENT DATE);
The next statement inserts a row into a full-select that just happens to have a predicate which,
if used in a subsequent query, would not find the row inserted. The predicate has no impact
on the insert itself:
46
Insert
INSERT INTO
(SELECT *
FROM
emp_act
WHERE empno < 1
)
VALUES (510000 ,ABC ,10 ,1.4 ,2003-10-22, 2003-11-24);
One can insert a set of rows that is the result of a query using the following notation:
INSERT INTO emp_act
SELECT LTRIM(CHAR(id + 600000))
,SUBSTR(UCASE(name),1,6)
,salary / 229
,123
,CURRENT DATE
,2003-11-11
FROM
staff
WHERE id < 50;
If only some columns are inserted using the query, they need to be explicitly listed:
INSERT INTO emp_act (empno, actno, projno)
SELECT LTRIM(CHAR(id + 700000))
,MINUTE(CURRENT TIME)
,DEF
FROM
staff
WHERE id < 40;
47
Graeme Birchall
The select can also refer to a common table expression. In the following example, six values
are first generated, each in a separate row. These rows are then selected from during the insert:
INSERT INTO emp_act (empno, actno, projno, emptime)
WITH temp1 (col1) AS
(VALUES (1),(2),(3),(4),(5),(6))
SELECT LTRIM(CHAR(col1 + 910000))
,col1
,CHAR(col1)
,col1 / 2
FROM
temp1;
Below are two tables that hold data for US and international customers respectively:
CREATE TABLE us_customer
(cust#
INTEGER
NOT NULL
,cname
CHAR(10) NOT NULL
,country CHAR(03) NOT NULL
,CHECK
(country = USA)
,PRIMARY KEY (cust#));
48
Insert
Update
The UPDATE statement is used to change one or more columns/rows in a table, view, or fullselect. Each column that is to be updated has to specified. Here is an example:
UPDATE
SET
WHERE
emp_act
emptime
,emendate
,emstdate
,actno
,projno
empno
= NULL
= DEFAULT
= CURRENT DATE + 2 DAYS
= ACTNO / 2
= ABC
= 100000;
UPDATE
INCLUDE
column-name
data-type
,
SET
column-name
expression
WHERE
predicates
One can update rows in a table, view, or full-select. If the object is not a table, then it
must be updateable (i.e. refer to a single table, not have any column functions, etc).
The correlation name is optional, and is only needed if there is an expression or predicate
that references another table.
The columns in the INCLUDE list are not updated. They are intended to be referenced in
a SELECT statement that encompasses the UPDATE (see page 54).
The SET statement lists the columns to be updated, and the new value they will get.
Predicates are optional. If none are provided, all rows in the table are updated.
Update Examples
emp_act
actno = actno / 2;
emp_act ac1
actno
= actno * 2
,emptime
= actno * 2
empno LIKE 910%;
49
Graeme Birchall
UPDATE
SET
emp_act
actno
WHERE
empno
= (SELECT MAX(salary)
FROM
staff)
= 200000;
An update statement can be run against a table, a view, or a full-select. In the next example,
the table is referred to directly:
UPDATE
SET
WHERE
AND
emp_act
emptime = 10
empno
= 000010
projno = MA2100;
Imagine that we want to set the employee-time for a particular row in the EMP_ACT table to
the MAX time for that employee. Below is one way to do it:
UPDATE
SET
WHERE
AND
emp_act ea1
emptime = (SELECT MAX(emptime)
FROM
emp_act ea2
WHERE ea1.empno = ea2.empno)
empno
= 000010
projno = MA2100;
50
Update
The same result can be achieved by calling an OLAP function in a full-select, and then updating the result. In next example, the MAX employee-time per employee is calculated (for each
row), and placed in a new column. This column is then used to do the final update:
UPDATE
(SELECT
ea1.*
,MAX(emptime) OVER(PARTITION BY empno) AS maxtime
emp_act ea1
FROM
)AS ea2
SET
emptime = maxtime
WHERE
empno
= 000010
AND
projno = MA2100;
Figure 139, Use OLAP function to get max-time, then apply (correct)
The above statement has the advantage of only accessing the EMP_ACT table once. If there
were many rows per employee, and no suitable index (i.e. on EMPNO and EMPTIME), it
would be much faster than the prior update.
The next update is similar to the prior - but it does the wrong update! In this case, the scope of
the OLAP function is constrained by the predicate on PROJNO, so it no longer gets the MAX
time for the employee:
UPDATE
SET
WHERE
AND
emp_act
emptime = MAX(emptime) OVER(PARTITION BY empno)
empno
= 000010
projno = MA2100;
Figure 140, Use OLAP function to get max-time, then apply (wrong)
Correlated and Uncorrelated Update
In the next example, regardless of the number of rows updated, the ACTNO will always come
out as one. This is because the sub-query that calculates the row-number is correlated, which
means that it is resolved again for each row to be updated in the "AC1" table. At most, one
"AC2" row will match, so the row-number must always equal one:
UPDATE emp_act ac1
SET
(actno
,emptime) = (SELECT ROW_NUMBER() OVER()
,ac1.emptime / 2
FROM
emp_act ac2
WHERE ac2.empno
LIKE 60%
AND SUBSTR(ac2.empno,3) = SUBSTR(ac1.empno,3))
WHERE
EMPNO LIKE 800%;
51
Graeme Birchall
Delete
The DELETE statement is used to remove rows from a table , view, or full-select. The set of
rows deleted depends on the scope of the predicates used. The following example would delete a single row from the EMP_ACT sample table:
DELETE
FROM
WHERE
AND
AND
emp_act
empno
projno
actno
= 000010
= MA2100
= 10;
DELETE FROM
corr-name
,
INCLUDE
WHERE
column-name
data-type
predicates
One can delete rows from a table, view, or full-select. If the object is not a table, then it
must be deletable (i.e. refer to a single table, not have any column functions, etc).
The correlation name is optional, and is only needed if there is a predicate that references
another table.
The columns in the INCLUDE list are not updated. They are intended to be referenced in
a SELECT statement that encompasses the DELETE (see page 54).
Predicates are optional. If none are provided, all rows are deleted.
Basic Delete
emp_act;
emp_act
empno
LIKE 00%
projno
>= MA;
The next example deletes all the rows in the STAFF table - except those that have the highest
ID in their respective department:
52
Delete
DELETE
FROM
WHERE
staff s1
id NOT IN
(SELECT MAX(id)
FROM
staff s2
WHERE s1.dept = s2.dept);
staff s1
EXISTS
(SELECT *
FROM
staff s2
WHERE s2.dept = s1.dept
AND s2.id
> s1.id);
A delete removes all encompassing rows. Sometimes this is not desirable - usually because an
unknown, and possibly undesirably large, number rows is deleted. One can write a delete that
stops after "n" rows, but the code is not pretty. The logic goes as follows:
Select from the nested table expression the first "n" rows.
Delete from the real table all rows matching those in the nested table expression.
The above code can only work as intended if the table in question has a set of fields that make
up a unique key. One has to code the final delete to join to the nested table expression using
those fields - as is done in the following example:
DELETE
FROM
emp_act
WHERE (empno, projno, actno) IN
(SELECT empno
,projno
,actno
FROM
(SELECT eee.*
,ROW_NUMBER() OVER(ORDER BY empno, projno,
actno) AS r#
FROM
emp_act eee
)AS xxx
WHERE
r# <= 10);
53
Graeme Birchall
SELECT
column-list
FROM
OLD
TABLE
DML stmt
NEW
FINAL
WHERE
predicates
ORDER BY
sort-columns
INPUT SEQUENCE
OLD: Returns the state of the data prior to the statement being run. This is allowed for an
update and a delete.
NEW: Returns the state of the data prior to the application of any AFTER triggers or referential constraints. Data in the table will not equal what is returned if it is subsequently
changed by AFTER triggers or R.I. This is allowed for an insert and an update.
FINAL: Returns the final state of the data. If there AFTER triggers that alter the target
table after running of the statement, an error is returned. Ditto for a view that is defined
with an INSTEAD OF trigger. This is allowed for an insert and an update.
Usage Notes
Only one of the above tables can be listed in the FROM statement.
The table listed in the FROM statement cannot be given a correlation name.
No other table can be listed (i.e. joined to) in the FROM statement. One can reference
another table in the SELECT list (see example page 57), or by using a sub-query in the
predicate section of the statement.
To retrieve (generated) columns that are not in the target table, list them in an INCLUDE
phrase in the DML statement. This technique can be used to, for example, assign row
numbers to the set of rows entered during an insert.
Predicates (on the select) are optional. They have no impact on the underlying DML.
The INPUT SEQUENCE phrase can be used in the ORDER BY to retrieve the rows in
the same sequence as they were inserted. It is not valid in an update or delete.
The usual scalar functions, OLAP functions, and column functions, plus the GROUP BY
phrase, can be applied to the output - as desired.
Insert Examples
The example below selects from the final result of the insert:
54
ANSWER
==============
EMPNO PRJ ACT
------ --- --200000 ABC 10
200000 DEF 10
SELECT
empno
,projno AS prj
,actno AS act
FROM
FINAL TABLE
(INSERT INTO emp_act
VALUES (200000,ABC,10 ,1,2003-10-22,2003-11-24)
,(200000,DEF,10 ,1,2003-10-22,2003-11-24))
ORDER BY 1,2,3;
empno
,projno AS prj
,actno AS act
,row#
AS r#
FROM
FINAL TABLE
(INSERT INTO emp_act (empno, projno, actno)
INCLUDE (row# SMALLINT)
VALUES (300000,ZZZ,999,1)
,(300000,VVV,111,2))
ORDER BY row#;
ANSWER
=================
EMPNO PRJ ACT R#
------ --- --- -300000 ZZZ 999 1
300000 VVV 111 2
empno
,projno AS prj
,actno AS act
,ROW_NUMBER() OVER() AS r#
FROM
FINAL TABLE
(INSERT INTO emp_act (empno, projno, actno)
VALUES (400000,ZZZ,999)
,(400000,VVV,111))
ORDER BY INPUT SEQUENCE;
ANSWER
=================
EMPNO PRJ ACT R#
------ --- --- -400000 ZZZ 999 1
400000 VVV 111 2
In the next example, the only way to know for sure what the insert has done is to select from
the result. This is because the select statement (in the insert) has the following unknowns:
We do not, or may not, know what ID values were selected, and thus inserted.
55
Graeme Birchall
SELECT
empno
,projno AS prj
,actno AS act
,ROW_NUMBER() OVER() AS r#
FROM
NEW TABLE
(INSERT INTO emp_act (empno, actno, projno)
SELECT LTRIM(CHAR(id + 600000))
,SECOND(CURRENT TIME)
,CHAR(SMALLINT(RAND(1) * 1000))
FROM
staff
WHERE
id < 40)
ORDER BY INPUT SEQUENCE;
ANSWER
=================
EMPNO PRJ ACT R#
------ --- -- -600010 1
59 1
600020 563 59 2
600030 193 59 3
The statement below updates the matching rows by a fixed amount. The select statement gets
the old EMPTIME values:
SELECT
empno
,projno AS prj
,emptime AS etime
FROM
OLD TABLE
(UPDATE emp_act
SET
emptime = emptime * 2
WHERE empno
= 200000)
ORDER BY projno;
ANSWER
================
EMPNO PRJ ETIME
------ --- ----200000 ABC 1.00
200000 DEF 1.00
projno AS prj
,old_t
AS old_t
,emptime AS new_t
FROM
NEW TABLE
(UPDATE emp_act
INCLUDE (old_t DECIMAL(5,2))
SET
emptime = emptime * RAND(1) * 10
,old_t
= emptime
WHERE
empno
= 200000)
ORDER BY 1;
ANSWER
===============
PRJ OLD_T NEW_T
--- ----- ----ABC 2.00 0.02
DEF 2.00 11.27
projno AS prj
,actno AS act
FROM
OLD TABLE
(DELETE
FROM
emp_act
WHERE empno = 300000)
ORDER BY 1,2;
ANSWER
=======
PRJ ACT
--- --VVV 111
ZZZ 999
56
SELECT
empno
,projno
,actno AS act
,row#
AS r#
FROM
OLD TABLE
(DELETE
FROM
emp_act
INCLUDE (row# SMALLINT)
SET
row# = ROW_NUMBER() OVER()
WHERE
empno = 000260)
WHERE
row# = row# / 2 * 2
ORDER BY 1,2,3;
ANSWER
====================
EMPNO PROJNO ACT R#
------ ------ --- -000260 AD3113 70 2
000260 AD3113 80 4
000260 AD3113 180 6
One cannot join the table generated by a DML statement to another table, nor include it in a
nested table expression, but one can join in the SELECT phrase. The following delete illustrates this concept by joining to the EMPLOYEE table:
SELECT
empno
,(SELECT lastname
FROM
(SELECT empno AS e#
,lastname
FROM
employee
)AS xxx
WHERE
empno = e#)
,projno AS projno
,actno AS act
FROM
OLD TABLE
(DELETE
FROM
emp_act
WHERE
empno < 0001)
FETCH FIRST 5 ROWS ONLY;
ANSWER
==========================
EMPNO LASTNAME PROJNO ACT
------ -------- ------ --000010 HAAS
AD3100 10
000010 HAAS
MA2100 10
000010 HAAS
MA2110 10
000020 THOMPSON PL2100 30
000030 KWAN
IF1000 10
Merge
The MERGE statement is a combination insert and update, or delete, statement on steroids. It
can be used to take the data from a source table, and combine it with the data in a target table.
The qualifying rows in the source and target tables are first matched by unique key value, and
then evaluated:
If the source row is already in the target, the latter can be either updated or deleted.
57
Graeme Birchall
MERGE INTO
USING
ON
search-conditions
WHEN MATCHED
THEN
AND
search-c
corr-name
corr-name
UPDATE SET...
DELETE
SIGNAL...
THEN
AND
ELSE IGNORE
search-c
INSERT...
SIGNAL...
Correlation names are optional, but are required if the field names are not unique.
If the target of the merge is a full-select or a view, it must allow updates, inserts, and deletes - as if it were an ordinary table.
The ON conditions must uniquely identify the matching rows in the target table.
Each individual WHEN check can only invoke a single modification statement.
When a MATCHED search condition is true, the matching target row can be updated,
deleted, or an error can be flagged.
When a NOT MATCHED search condition is true, the source row can be inserted into
the target table, or an error can be flagged.
When more than one MATCHED or NOT MATCHED search condition is true, the first
one that matches (for each type) is applied. This prevents any target row from being updated or deleted more than once. Ditto for any source row being inserted.
The ELSE IGNORE phrase specifies that no action be taken if no WHEN check evaluates to true.
Sample Tables
To illustrate the merge statement, the following test tables were created and populated:
58
Merge
INTO old_staff
id, job, salary
staff
id BETWEEN 20 and 40;
OLD_STAFF
+-----------------+
|ID|JOB |SALARY |
|--|-----|--------|
|20|Sales|18171.25|
|30|Mgr |17506.75|
|40|Sales|18006.00|
+-----------------+
INSERT
SELECT
FROM
WHERE
NEW_STAFF
+----------+
|ID|SALARY |
|--|-------|
|30|1750.67|
|40|1800.60|
|50|2065.98|
+----------+
INTO new_staff
id, salary / 10
staff
id BETWEEN 30 and 50;
The next statement merges the new staff table into the old, using the following rules:
OLD_STAFF
+-----------------+
|ID|JOB |SALARY |
|--|-----|--------|
|20|Sales|18171.25|
|30|Mgr |17506.75|
|40|Sales|18006.00|
+-----------------+
NEW_STAFF
+----------+
|ID|SALARY |
|--|-------|
|30|1750.67|
|40|1800.60|
|50|2065.98|
+----------+
AFTER-MERGE
=================
ID JOB
SALARY
-- ----- -------20 Sales 18171.25
30 Mgr
1750.67
40 Sales 1800.60
50 ?
2065.98
AFTER-MERGE
=================
ID JOB
SALARY
-- ----- -------20 Sales 18171.25
59
Graeme Birchall
If no row matches, and the new ID is > 10, the new row is inserted.
If no row matches, and (by implication) the new ID is <= 10, an error is flagged.
The following merge generates an input table (i.e. full-select) that has a single row containing
the MAX value of every field in the relevant table. This row is then inserted into the table:
MERGE INTO old_staff
USING
(SELECT MAX(id) + 1 AS max_id
,MAX(job)
AS max_job
,MAX(salary) AS max_sal
FROM
old_staff
)AS mx
ON
id = max_id
WHEN NOT MATCHED THEN
INSERT
VALUES (max_id, max_job, max_sal);
AFTER-MERGE
=================
ID JOB
SALARY
-- ----- -------20 Sales 18171.25
30 Mgr
17506.75
40 Sales 18006.00
41 Sales 18171.25
60
Merge
MERGE INTO
(SELECT *
FROM
old_staff
WHERE id < 40
)AS oo
USING
(SELECT *
FROM
new_staff
WHERE id < 50
)AS nn
ON
oo.id = nn.id
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT
VALUES (nn.id,?,nn.salary);
OLD_STAFF
+-----------------+
|ID|JOB |SALARY |
|--|-----|--------|
|20|Sales|18171.25|
|30|Mgr |17506.75|
|40|Sales|18006.00|
+-----------------+
NEW_STAFF
+----------+
|ID|SALARY |
|--|-------|
|30|1750.67|
|40|1800.60|
|50|2065.98|
+----------+
AFTER-MERGE
=================
ID JOB
SALARY
-- ----- -------20 Sales 18171.25
40 ?
1800.60
40 Sales 18006.00
The target row with an ID of 40 was not deleted, because it was excluded in the fullselect that was done before the merge.
The source row with an ID of 40 was inserted, because it was not found in the target fullselect. This is why the base table now has two rows with an ID of 40.
The source row with an ID of 50 was not inserted, because it was excluded in the fullselect that was done before the merge.
Listing Columns
The next example explicitly lists the target fields in the insert statement - so they correspond
to those listed in the following values phrase:
MERGE INTO old_staff oo
USING new_staff nn
ON
oo.id = nn.id
WHEN MATCHED THEN
UPDATE
SET (salary,job) = (1234,?)
WHEN NOT MATCHED THEN
INSERT (id,salary,job)
VALUES (id,5678.9,?);
AFTER-MERGE
=================
ID JOB
SALARY
-- ----- -------20 Sales 18171.25
30 ?
1234.00
40 ?
1234.00
50 ?
5678.90
61
Graeme Birchall
62
Merge
Compound SQL
A compound statement groups multiple independent SQL statements into a single executable.
In addition, simple processing logic can be included to create what is, in effect, a very basic
program. Such statements can be embedded in triggers, SQL functions, SQL methods, and
dynamic SQL statements.
Introduction
A compound SQL statement begins with an (optional) name, followed by the variable declarations, followed by the procedural logic:
BEGIN ATOMIC
label:
,
DECLARE
var-name
DEFAULT NULL
data type
SQLSTATE
DECLARE
VALUE
cond-name
string constant
END
label:
DB2 SQL does not come with an designated statement delimiter (terminator), though a semicolon is usually used. However, a semi-colon cannot be used in a compound SQL statement
because that character is used to differentiate the sub-components of the statement.
In DB2BATCH, one can run the SET DELIMITER command (intelligent comment) to use
something other than a semi-colon. The following script illustrates this usage:
Compound SQL
63
Graeme Birchall
--#SET DELIMITER !
SELECT NAME FROM STAFF WHERE ID = 10!
--#SET DELIMITER ;
SELECT NAME FROM STAFF WHERE ID = 20;
FOR statement
IF statement
ITERATE statement
LEAVE statement
SIGNAL statement
WHILE statement
NOTE: There are many more PSM control statements than what is shown above. But only
these ones can be used in Compound SQL statements.
full-select
UPDATE
DELETE
INSERT
DECLARE Variables
All variables have to be declared at the start of the compound statement. Each variable must
be given a name and a type and, optionally, a default (start) value.
BEGIN ATOMIC
DECLARE aaa, bbb, ccc SMALLINT DEFAULT 1;
DECLARE ddd
CHAR(10) DEFAULT NULL;
DECLARE eee
INTEGER;
SET eee = aaa + 1;
UPDATE
staff
SET
comm
= aaa
,salary = bbb
,years = eee
WHERE
id
= 10;
END
64
FOR Statement
The FOR statement executes a group of statements for each row fetched from a query.
FOR
label:
select-stmt
for-loop-name
DO
AS
cursor-name
SQL-procedure-stmt ;
END FOR
DEFAULT
label:
dept
AS dname
,max(id) AS max_id
FROM
staff
GROUP BY dept
HAVING
COUNT(*) > 1
ORDER BY dept
DO
UPDATE
SET
WHERE
UPDATE
set
WHERE
AND
END FOR;
staff
id
=
id
=
staff
dept =
dept =
dept <
id * -1
max_id;
dept / 10
dname
30;
END
The GET DIAGNOSTICS statement returns information about the most recently run SQL
statement. One can either get the number of rows processed (i.e. inserted, updated, or deleted), or the return status (for an external procedure call).
GET DIAGNOSTICS
SQL-var-name
ROW_COUNT
RETURN_COUNT
Compound SQL
65
Graeme Birchall
IF Statement
The IF statement is used to do standard if-then-else branching logic. It always begins with an
IF THEN statement and ends with and END IF statement.
IF
seach condition
ELSEIF
THEN
seach condition
THEN
ELSE
The ITERATE statement causes the program to return to the beginning of the labeled loop.
ITERATE
label
66
LEAVE Statement
label
sqlstate string
SQLSTATE
condition-name
SET
MESSAGE_TEXT
variable-name
diagnostic-string
The WHILE statement repeats one or more statements while some condition is true.
label:
WHILE
END WHILE
seach-condition
DO
SQL-procedure-stmt ;
label:
Compound SQL
67
Graeme Birchall
The next statement has two nested WHILE loops, and then updates the STAFF table:
BEGIN ATOMIC
DECLARE c1, C2 INT DEFAULT 1;
WHILE c1 < 10 DO
WHILE c2 < 20 DO
SET c2 = c2 + 1;
END WHILE;
SET c1 = c1 + 1;
END WHILE;
UPDATE staff
SET
salary = c1
,comm
= c2
WHERE id
= 10;
END
Other Usage
The following DB2 objects also support the language elements described above:
Triggers.
Stored procedures.
User-defined functions.
Some of the above support many more language elements. For example stored procedures
that are written in SQL also allow the following: ASSOCIATE, CASE, GOTO, LOOP, REPEAT, RESIGNAL, and RETURN.
NOTE: To write stored procedures in the SQL language, you need a C compiler.
Test Query
To illustrate some of the above uses of compound SQL, we are going to get from the STAFF
table as complete list of departments, and the number of rows in each department. Here is the
basic query, with the related answer:
SELECT
dept
,count(*) as #rows
FROM
staff
GROUP BY dept
ORDER BY dept;
ANSWER
==========
DEPT #ROWS
---- ----10
4
15
4
20
4
38
5
42
4
51
5
66
5
84
4
68
Other Usage
Trigger
One cannot get an answer using a trigger. All one can do is alter what happens during an insert, update, or delete. With this in mind, the following example does the following:
Sets the statement delimiter to an "!". Because we are using compound SQL inside the
trigger definition, we cannot use the usual semi-colon.
Creates a new table (note: triggers are not allowed on temporary tables).
Creates an INSERT trigger on the new table. This trigger gets the number of rows per
department in the STAFF table - for each row (department) inserted.
NOT NULL
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
ANSWER
===========
DEPT #NAMES
---- -----10
4
15
4
20
4
38
5
42
4
51
5
66
5
84
4
Scalar Function
One can do something very similar to the above that is almost as stupid using a user-defined
scalar function, that calculates the number of rows in a given department. The basic logic will
go as follows:
Compound SQL
69
Graeme Birchall
Run a query that first gets a list of distinct departments, then calls the function.
XXX.*
,dpt1(dept) as #names
FROM
(SELECT
dept
FROM
staff
GROUP BY dept
)AS XXX
ORDER BY dept!
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
ANSWER
===========
DEPT #NAMES
---- -----10
4
15
4
20
4
38
5
42
4
51
5
66
5
84
4
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
SELECT
XXX.*
,dpt1(dept) as #names
FROM
(SELECT
dept
FROM
staff
GROUP BY dept
)AS XXX
ORDER BY dept!
Below is almost exactly the same logic, this time using a table function:
70
Other Usage
--#SET DELIMITER !
CREATE FUNCTION dpt2 ()
RETURNS TABLE (dept
SMALLINT
,#names SMALLINT)
BEGIN ATOMIC
RETURN
SELECT
dept
,count(*)
FROM
staff
GROUP BY dept
ORDER BY dept;
END!
COMMIT!
--#SET DELIMITER ;
SELECT
*
FROM
TABLE(dpt2()) T1
ORDER BY dept;
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
ANSWER
===========
DEPT #NAMES
---- -----10
4
15
4
20
4
38
5
42
4
51
5
66
5
84
4
Compound SQL
71
Graeme Birchall
72
Other Usage
Sample Application
Consider the following two tables, which make up a very simple application:
CREATE TABLE customer_balance
(cust_id
INTEGER
,cust_name
VARCHAR(20)
,cust_sex
CHAR(1)
,num_sales
SMALLINT
,total_sales
DECIMAL(12,2)
,master_cust_id
INTEGER
,cust_insert_ts
TIMESTAMP
,cust_update_ts
TIMESTAMP);
CREATE TABLE us_sales
(invoice#
INTEGER
,cust_id
INTEGER
,sale_value
DECIMAL(18,2)
,sale_insert_ts
TIMESTAMP
,sale_update_ts
TIMESTAMP);
CUST_ID will be a unique positive integer value, always ascending, never reused, and
automatically generated by DB2. This field cannot be updated by a user.
CUST_NAME has the customer name. It can be anything, but not blank.
NUM_SALES will have a count of the sales (for the customer), as recorded in the related
US-sales table. The value will be automatically maintained by DB2. It cannot be updated
directly by a user.
73
Graeme Birchall
TOTAL_SALES will have the sum sales (in US dollars) for the customer. The value will
be automatically updated by DB2. It cannot be updated directly by a user.
MASTER_CUST_ID will have, if there exists, the customer-ID of the customer that this
customer is a dependent of. If there is no master customer, the value is null. If the master
customer is deleted, this row will also be deleted (if possible).
CUST_INSERT_TS has the timestamp when the row was inserted. The value is automatically generated by DB2. Any attempt to change will induce an error.
CUST_UPDATE_TS has the timestamp when the row was last updated by a user (note:
not by a trigger as a result of a change to the US-sales table). The value is automatically
generated by DB2. Any attempt to change will induce an error.
A row can only be deleted when there are no corresponding rows in the US-sales table
(i.e. for the same customer).
US Sales Table
INVOICE#: will be a unique ascending integer value. The uniqueness will apply to the
US-sales table, plus any international sales tables (i.e. to more than one table).
CUST_ID is the customer ID, as recorded in the customer-balance table. No row can be
inserted into the US-sales table except that there is a corresponding row in the customerbalance table. Once inserted, this value cannot be updated.
SALE_VALUE is the value of the sale, in US dollars. When a row is inserted, this value
is added to the related total-sales value in the customer-balance table. If the value is subsequently updated, the total-sales value is maintained in sync.
SALE_INSERT_TS has the timestamp when the row was inserted. The value is automatically generated by DB2. Any attempt to change will induce an error.
SALE_UPDATE_TS has the timestamp when the row was last updated. The value is
automatically generated by DB2. Any attempt to change will induce an error.
Deleting a row from the US-sales table it has no impact on the customer-balance table
(i.e. the total-sales is not decremented). But a row can only be deleted from the latter
when there are no more related rows in the US-sales table.
Enforcement Tools
Unique indexes.
74
Sample Application
Two of the fields are to contain US dollars, the implication being the data in these columns
should not be combined with columns that contain Euros, or Japanese Yen, or my shoe size.
To this end, we will define a distinct data type for US dollars:
CREATE DISTINCT TYPE us_dollars AS decimal(18,2) WITH COMPARISONS;
Now that we have defined the data type, we can create our first table:
CREATE TABLE customer_balance
(cust_id
INTEGER
NOT NULL
GENERATED ALWAYS AS IDENTITY
(START WITH 1
,INCREMENT BY 1
,NO CYCLE
,NO CACHE)
,cust_name
VARCHAR(20)
NOT NULL
,cust_sex
CHAR(1)
NOT NULL
,num_sales
SMALLINT
NOT NULL
,total_sales
us_dollars
NOT NULL
,master_cust_id
INTEGER
,cust_insert_ts
TIMESTAMP
NOT NULL
,cust_update_ts
TIMESTAMP
NOT NULL
,PRIMARY KEY
(cust_id)
,CONSTRAINT c1 CHECK (cust_name
<> )
,CONSTRAINT c2 CHECK (cust_sex
= F
OR cust_sex
= M)
,CONSTRAINT c3 FOREIGN KEY (master_cust_id)
REFERENCES customer_balance (cust_id)
ON DELETE CASCADE);
The customer-ID is defined as an identity column (see page 259), which means that the
value is automatically generated by DB2 using the rules given. The field cannot be updated by the user.
The customer-ID is defined as the primary key, which automatically generates a unique
index on the field, and also enables us to reference the field using a referential integrity
rule. Being a primary key prevents updates, but we had already prevented them because
the field is an identity column.
Constraint C3 relates the current row to a master customer, if one exists. Furthermore, if
the master customer is deleted, this row is also deleted.
All of the columns, except for the master-customer-id, are defined as NOT NULL, which
means that a value must be provided.
We still have several more business rules to enforce - relating to automatically updating fields
and/or preventing user updates. These will be enforced using triggers.
75
Graeme Birchall
US-Sales Table
The invoice# is defined as the primary key, which automatically generates a unique index
on the field, and also prevents updates.
Constraint U2 checks that the customer-ID exists in the customer-balance table, and also
prevents rows from being deleted from the latter if their exists a related row in this table.
All of the columns are defined as NOT NULL, so a value must be provided for each.
A secondary non-unique index is defined on customer-ID, so that deletes to the customerbalance table (which require checking this table for related customer-ID rows) are as efficient as possible.
Triggers
Triggers can sometimes be quite complex little programs. If coded incorrectly, they can do an
amazing amount of damage. As such, it pays to learn quite a lot before using them. Below are
some very brief notes, but please refer to the official DB2 documentation for a more detailed
description.
Individual triggers are defined on a table, and for a particular type of DML statement:
Insert.
Update.
Delete.
Row changed.
Statement run.
76
Sample Application
Before triggers change input values before they are entered into the table and/or flag an error.
After triggers do things after the row is changed. They may make more changes (to the target
table, or to other tables), induce an error, or invoke an external program. SQL statements that
select the changes made by DML (see page 54) cannot see the changes made by an after trigger if those changes impact the rows just changed.
The action of one "after" trigger can invoke other triggers, which may then invoke other triggers, and so on. Before triggers cannot do this because they can only act upon the input values
of the DML statement that invoked them.
When there are multiple triggers for a single table/action, each trigger is run for all rows before the next trigger is invoked - even if defined "for each row". Triggers are invoked in the
order that they were created.
Customer-Balance - Insert Trigger
For each row inserted into the Customer-Balance table we need to do the following:
77
Graeme Birchall
For each row inserted into the US-sales table we need to do the following:
78
Sample Application
We need to use an "after" trigger to maintain the two related values in the Customer-Balance
table. This will invoke an update to change the target row:
CREATE TRIGGER sales_to_cust_ins1
AFTER INSERT
ON us_sales
REFERENCING NEW AS nnn
FOR EACH ROW
MODE DB2SQL
UPDATE customer_balance ccc
SET
ccc.num_sales
= ccc.num_sales + 1
,ccc.total_sales
= DECIMAL(ccc.total_sales) +
DECIMAL(nnn.sale_value)
WHERE ccc.cust_id
= nnn.cust_id;
For each row updated in the US-sales table we need to do the following:
Propagate the change to the sale-value to the total-sales in the customer-balance table.
79
Graeme Birchall
Conclusion
The above application will now have logically consistent data. There is, of course, nothing to
prevent an authorized user from deleting all rows, but whatever rows are in the two tables will
obey the business rules that we specified at the start.
Tools Used
Distinct types - to prevent one type of data from being combined with another type.
Referential integrity - to enforce relationships between rows/tables, and to enable cascading deletes when needed.
80
Sample Application
Column Functions
Introduction
By themselves, column functions work on the complete set of matching rows. One can use a
GROUP BY expression to limit them to a subset of matching rows. One can also use them in
an OLAP function to treat individual rows differently.
WARNING: Be very careful when using either a column function, or the DISTINCT clause,
in a join. If the join is incorrectly coded, and does some form of Cartesian Product, the column function may get rid of the all the extra (wrong) rows so that it becomes very hard to
confirm that the answer is incorrect. Likewise, be appropriately suspicious whenever you
see that someone (else) has used a DISTINCT statement in a join. Sometimes, users add
the DISTINCT clause to get rid of duplicate rows that they didn't anticipate and don't understand.
Get the average (mean) value of a set of non-null rows. The columns(s) must be numeric.
ALL is the default. If DISTINCT is used duplicate values are ignored. If no rows match, the
null value is returned.
AVG (
ALL
expression
DISTINCT
FROM
HAVING
AVG(dept)
,AVG(ALL dept)
,AVG(DISTINCT dept)
,AVG(dept/10)
,AVG(dept)/10
staff
AVG(dept) > 40;
AS
AS
AS
AS
AS
a1
a2
a3
a4
a5
ANSWER
==============
A1 A2 A3 A4 A5
-- -- -- -- -41 41 40 3 4
Some database designers have an intense and irrational dislike of using nullable fields. What
they do instead is define all columns as not-null and then set the individual fields to zero (for
numbers) or blank (for characters) when the value is unknown. This solution is reasonable in
some situations, but it can cause the AVG function to give what is arguably the wrong answer.
One solution to this problem is some form of counseling or group therapy to overcome the
phobia. Alternatively, one can use the CASE expression to put null values back into the answer-set being processed by the AVG function. The following SQL statement uses a modified
version of the IBM sample STAFF table (all null COMM values were changed to zero) to
illustrate the technique:
Column Functions
81
Graeme Birchall
UPDATE staff
SET
comm = 0
WHERE comm IS NULL;
SELECT AVG(salary) AS salary
,AVG(comm)
AS comm1
,AVG(CASE comm
WHEN 0 THEN NULL
ELSE comm
END) AS comm2
FROM
staff;
ANSWER
===================
SALARY COMM1 COMM2
------- ----- ----16675.6 351.9 513.3
UPDATE staff
SET
comm = NULL
WHERE comm = 0;
The AVG, MIN, MAX, and SUM functions all return a null value when there are no matching rows. One use the COALESCE function, or a CASE expression, to convert the null value
into a suitable substitute. Both methodologies are illustrated below:
SELECT
FROM
WHERE
COUNT(*) AS c1
,AVG(salary) AS a1
,COALESCE(AVG(salary),0) AS a2
,CASE
WHEN AVG(salary) IS NULL THEN 0
ELSE AVG(salary)
END AS a3
staff
id < 10;
ANSWER
===========
C1 A1 A2 A3
-- -- -- -0 - 0 0
The AVG function only accepts numeric input. However, one can, with a bit of trickery, also
use the AVG function on a date field. First convert the date to the number of days since the
start of the Current Era, then get the average, then convert the result back to a date. Please be
aware that, in many cases, the average of a date does not really make good business sense.
Having said that, the following SQL gets the average birth-date of all employees:
SELECT
FROM
AVG(DAYS(birthdate))
,DATE(AVG(DAYS(birthdate)))
employee;
ANSWER
=================
1
2
------ ---------709113 1942-06-27
In some cases, getting the average of an average gives an overflow error. Inasmuch as you
shouldnt do this anyway, it is no big deal:
82
ANSWER
================
<Overflow error>
I dont know a thing about statistics, so I havent a clue what this function does. But I do know
that the SQL Reference is wrong - because it says the value returned will be between 0 and 1.
I found that it is between -1 and +1 (see below). The output type is float.
CORRELATION
expression , expression
CORR
ANSWER
===========================
COR11 COR12 COR23 COR34
------ ------ ------ -----1.000 -1.000 -0.017 -0.005
AS
AS
AS
AS
cor11
cor12
cor23
cor34
Get the number of values in a set of rows. The result is an integer. The value returned depends
upon the options used:
ALL
expression
DISTINCT
AS
AS
AS
AS
AS
AS
c1
c2
c3
c4
c5
c6
ANSWER
=================
C1 C2 C3 C4 C5 C6
-- -- -- -- -- -35 24 24 19 24 2
Column Functions
83
Graeme Birchall
There are 35 rows in the STAFF table (see C1 above), but only 24 of them have non-null
commission values (see C2 above).
If no rows match, the COUNT returns zero - except when the SQL statement also contains a
GROUP BY. In this latter case, the result is no row.
SELECT
NO GP-BY
,COUNT(*)
staff
id = -1
AS c1
AS c2
GROUP-BY
,COUNT(*)
FROM
staff
WHERE
id = -1
GROUP BY dept;
AS c1
AS c2
FROM
WHERE
UNION
SELECT
ANSWER
============
C1
C2
-------- -NO GP-BY
0
Get the number of rows or distinct values in a set of rows. Use this function if the result is too
large for the COUNT function. The result is of type decimal 31. If the DISTINCT option is
used both duplicate and null values are eliminated. If no rows match, the result is zero.
COUNT_BIG (
ALL
expression
DISTINCT
FROM
COUNT_BIG(*)
,COUNT_BIG(dept)
,COUNT_BIG(DISTINCT dept)
,COUNT_BIG(DISTINCT dept/10)
,COUNT_BIG(DISTINCT dept)/10
STAFF;
AS
AS
AS
AS
AS
c1
c2
c3
c4
c5
ANSWER
===================
C1 C2 C3 C4 C5
--- --- --- --- --35. 35. 8. 7. 0.
Returns the covariance of a set of number pairs. The output type is float.
COVARIANCE
expression , expression
COVAR
ANSWER
===============================
COV11
COV12
COV23
COV34
------- ------- ------- ------83666. -83666. -1.4689 -0.0004
AS
AS
AS
AS
cov11
cov12
cov23
cov34
84
GROUPING
The GROUPING function is used in CUBE, ROLLUP, and GROUPING SETS statements to
identify what rows come from which particular GROUPING SET. A value of 1 indicates that
the corresponding data field is null because the row is from of a GROUPING SET that does
not involve this row. Otherwise, the value is zero.
GROUPING (
expression
dept
,AVG(salary)
AS salary
,GROUPING(dept) AS df
FROM
staff
GROUP BY ROLLUP(dept)
ORDER BY dept;
ANSWER
================
DEPT SALARY
DF
---- -------- -10 20865.86 0
15 15482.33 0
20 16071.52 0
38 15457.11 0
42 14592.26 0
51 17218.16 0
66 17215.24 0
84 16536.75 0
- 16675.64 1
MAX
Get the maximum value of a set of rows. The use of the DISTINCT option has no affect. If no
rows match, the null value is returned.
MAX (
ALL
DISTINCT
expression
FROM
MAX(dept)
,MAX(ALL dept)
,MAX(DISTINCT dept)
,MAX(DISTINCT dept/10)
staff;
ANSWER
===============
1
2
3
4
--- --- --- --84 84 84
8
Several DB2 scalar functions convert a value from one format to another, for example from
numeric to character. The function output format will not always shave the same ordering
sequence as the input. This difference can affect MIN, MAX, and ORDER BY processing.
SELECT MAX(hiredate)
,CHAR(MAX(hiredate),USA)
,MAX(CHAR(hiredate,USA))
FROM
employee;
ANSWER
================================
1
2
3
---------- ---------- ---------1980-09-30 09/30/1980 12/15/1976
Column Functions
85
Graeme Birchall
SELECT MAX(id)
AS id
,MAX(CHAR(id))
AS chr
,MAX(DIGITS(id)) AS dig
FROM
staff;
ANSWER
===================
ID
CHR
DIG
------ ------ ----350 90
00350
ANSWER
=====================
ID
CHR DIG
----- ---- ---------100 90
0000000240
MIN
Get the minimum value of a set of rows. The use of the DISTINCT option has no affect. If no
rows match, the null value is returned.
MIN (
ALL
expression
DISTINCT
FROM
MIN(dept)
,MIN(ALL dept)
,MIN(DISTINCT dept)
,MIN(DISTINCT dept/10)
staff;
ANSWER
===============
1
2
3
4
--- --- --- --10 10 10
1
expression , expression
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT
REGR_ICPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY
86
Functions
REGR_AVGX returns a quantity that than can be used to compute the validity of the regression model. The output is of type float.
REGR_COUNT returns the number of matching non-null pairs. The output is integer.
See the IBM SQL Reference for more details on the above functions.
SELECT
FROM
WHERE
DEC(REGR_SLOPE(bonus,salary)
,7,5)
,DEC(REGR_INTERCEPT(bonus,salary),7,3)
,INT(REGR_COUNT(bonus,salary)
)
,INT(REGR_AVGX(bonus,salary)
)
,INT(REGR_AVGY(bonus,salary)
)
,INT(REGR_SXX(bonus,salary)
)
,INT(REGR_SXY(bonus,salary)
)
,INT(REGR_SYY(bonus,salary)
)
employee
workdept = A00;
AS
AS
AS
AS
AS
AS
AS
AS
ANSWERS
==========
0.01710
100.871
3
42833
833
296291666
5066666
86666
r_slope
r_icpt
r_count
r_avgx
r_avgy
r_sxx
r_sxy
r_syy
Get the standard deviation of a set of numeric values. If DISTINCT is used, duplicate values
are ignored. If no rows match, the result is null. The output format is double.
STDDEV (
ALL
DISTINCT
expression
SELECT AVG(dept) AS a1
,STDDEV(dept) AS s1
,DEC(STDDEV(dept),3,1) AS s2
,DEC(STDDEV(ALL dept),3,1) AS s3
,DEC(STDDEV(DISTINCT dept),3,1) AS s4
FROM
staff;
Get the sum of a set of numeric values If DISTINCT is used, duplicate values are ignored.
Null values are always ignored. If no rows match, the result is null.
Column Functions
87
Graeme Birchall
ALL
SUM (
expression
DISTINCT
FROM
SUM(dept)
,SUM(ALL dept)
,SUM(DISTINCT dept)
,SUM(dept/10)
,SUM(dept)/10
staff;
AS
AS
AS
AS
AS
s1
s2
s3
s4
s5
ANSWER
========================
S1
S2
S3
S4
S5
---- ---- ---- ---- ---1459 1459 326 134 145
VAR or VARIANCE
Get the variance of a set of numeric values. If DISTINCT is used, duplicate values are ignored. If no rows match, the result is null. The output format is double.
VARIANCE
VAR
ALL
DISTINCT
expression
SELECT AVG(dept) AS a1
,VARIANCE(dept) AS s1
,DEC(VARIANCE(dept),4,1) AS s2
,DEC(VARIANCE(ALL dept),4,1) AS s3
,DEC(VARIANCE(DISTINCT dept),4,1) AS s4
FROM
staff;
88
OLAP Functions
Introduction
The OLAP (Online Analytical Processing) functions enable one sequence and rank query
rows. They are especially useful when the calling program is very simple.
The Bad Old Days
To really appreciate the value of the OLAP functions, one should try to do some seemingly
trivial task without them. To illustrate this point, below is a simple little query:
SELECT
FROM
WHERE
AND
ORDER BY
ANSWER
=================
JOB
ID SALARY
----- -- -------Clerk 80 13504.60
Mgr
10 18357.50
Mgr
50 20659.80
sumsal
r
ID
-80
10
50
SALARY
-------13504.60
18357.50
20659.80
ANSWER
======
SUMSAL
R
-------- 13504.60 1
31862.10 2
52521.90 3
The output is ordered on two fields, the first of which is not unique.
Below are several examples that use plain SQL to get the above answer. All of the examples
have the same generic design (i.e. join each matching row to itself and all previous matching
rows) and share similar problems (i.e. difficult to read, and poor performance).
Nested Table Expression
Below is a query that uses a nested table expression to get the additional fields. This SQL has
the following significant features:
The TABLE phrase is required because the nested table expression has a correlated reference to the prior table. See page 283 for more details on the use of this phrase.
OLAP Functions
89
Graeme Birchall
There are no join predicates between the nested table expression output and the original
STAFF table. They are unnecessary because these predicates are provided in the body of
the nested table expression. With them there, and the above TABLE function, the nested
table expression is resolved once per row obtained from the staff s1 table.
The original literal predicates have to be repeated in the nested table expression.
The correlated predicates in the nested table expression have to match the ORDER BY
sequence (i.e. first JOB, then ID) in the final output.
ANSWER
============================
JOB
ID SALARY
SUMSAL
R
----- -- -------- -------- Clerk 80 13504.60 13504.60 1
Mgr
10 18357.50 31862.10 2
Mgr
50 20659.80 52521.90 3
The nested table expression is a partial Cartesian product. Each row fetched from "S1" is
joined to all prior rows (in "S2"), which quickly gets to be very expensive.
The join criteria match the ORDER BY fields. If the latter are suitably complicated, then
the join is going to be inherently inefficient.
In the next example, the STAFF table is joined to itself such that each matching row obtained
from the "S1" table is joined to all prior rows (plus the current row) in the "S2" table, where
"prior" is a function of the ORDER BY clause used. After the join, a GROUP BY is needed
in order to roll up the matching "S2" rows up into one:
SELECT
FROM
WHERE
AND
AND
AND
AND
OR
AND
GROUP
ORDER
ANSWER
============================
JOB
ID SALARY
SUMSAL
R
----- -- -------- -------- Clerk 80 13504.60 13504.60 1
Mgr
10 18357.50 31862.10 2
Mgr
50 20659.80 52521.90 3
90
Introduction
In our final example, two nested table expression are used to get the answer. Both are done in
the SELECT part of the main query:
SELECT
Figure 243, Using Nested Table Expressions in Select to get additional fields
Once again, this query processes the matching rows multiple times, repeats predicates, has
join predicates that match the ORDER BY, and does a partial Cartesian product. The only
difference here is that this query commits all of the above sins twice.
Conclusion
Almost anything that an OLAP function does can be done some other way using simple SQL.
But as the above examples illustrate, the alternatives are neither pretty nor efficient. And remember that the initial query used above was actually very simple. Feel free to try replacing
the OLAP functions in the following query with their SQL equivalents:
SELECT
FROM
WHERE
AND
AND
AND
ORDER
dpt.deptname
,emp.empno
,emp.lastname
,emp.salary
,SUM(salary) OVER(ORDER BY dpt.deptname
,emp.salary
,emp.empno
,ROW_NUMBER() OVER(ORDER BY dpt.deptname
,emp.salary
,emp.empno
employee
emp
,department dpt
emp.firstnme LIKE %S%
emp.workdept
= dpt.deptno
dpt.admrdept LIKE A%
NOT EXISTS
(SELECT *
FROM
emp_act eat
WHERE
emp.empno
= eat.empno
AND
eat.emptime > 10)
BY dpt.deptname ASC
,emp.salary
DESC
,emp.empno
ASC;
ASC
DESC
ASC)
ASC
DESC
ASC)
AS sumsal
AS row#
OLAP Functions
91
Graeme Birchall
The RANK and DENSE_RANK functions enable one to rank the rows returned by a query.
The result is of type BIGINT.
RANK()
OVER(
DENSE_RANK()
,
PARTITION BY
partitioning expression
,
ORDER BY
sort-key expression
asc option
desc option
asc option
NULLS LAST
ASC
desc option
DESC
NULLS FIRST
NULLS FIRST
NULLS LAST
The two functions differ in how they handle multiple rows with the same value:
The RANK function returns the number of proceeding rows, plus one. If multiple rows
have equal values, they all get the same rank, while subsequent rows get a ranking that
counts all of the prior rows. Thus, there may be gaps in the ranking sequence.
The DENSE_RANK function returns the number of proceeding distinct values, plus one.
If multiple rows have equal values, they all get the same rank. Each change in data value
causes the ranking number to be incremented by one.
92
SELECT
id
,years
,salary
,RANK()
,DENSE_RANK()
,ROW_NUMBER()
FROM
staff
WHERE
id
< 100
AND
years IS NOT
ORDER BY years;
ANSWER
===================================
ID YEARS SALARY
RANK# DENSE# ROW#
-- ----- -------- ----- ------ ---30
5 17506.75
1
1
1
40
6 18006.00
2
2
2
90
6 18001.75
2
2
3
10
7 18357.50
4
3
4
70
7 16502.83
4
3
5
20
8 18171.25
6
4
6
50
10 20659.80
7
5
7
The ORDER BY phrase, which is mandatory, gives a sequence to the ranking, and also tells
DB2 when to start a new rank value. The following query illustrates both uses:
SELECT
FROM
WHERE
AND
AND
ORDER
job
,years
,id
,name
,SMALLINT(RANK() OVER(ORDER BY job
ASC)) AS asc1
,SMALLINT(RANK() OVER(ORDER BY job
ASC
,years ASC)) AS asc2
,SMALLINT(RANK() OVER(ORDER BY job
ASC
,years ASC
,id
ASC)) AS asc3
,SMALLINT(RANK() OVER(ORDER BY job
DESC)) AS dsc1
,SMALLINT(RANK() OVER(ORDER BY job
DESC
,years DESC)) AS dsc2
,SMALLINT(RANK() OVER(ORDER BY job
DESC
,years DESC
,id
DESC)) AS Dsc3
,SMALLINT(RANK() OVER(ORDER BY job
ASC
,years DESC
,id
ASC)) AS mix1
,SMALLINT(RANK() OVER(ORDER BY job
DESC
,years ASC
,id
DESC)) AS mix2
staff
id
< 150
years IN (6,7)
job
> L
BY job
,years
,id;
ANSWER
================================================================
JOB
YEARS ID NAME
ASC1 ASC2 ASC3 DSC1 DSC2 DSC3 MIX1 MIX2
----- ----- --- ------- ---- ---- ---- ---- ---- ---- ---- ---Mgr
6 140 Fraye
1
1
1
4
6
6
3
4
Mgr
7 10 Sanders
1
2
2
4
4
5
1
6
Mgr
7 100 Plotz
1
2
3
4
4
4
2
5
Sales
6 40 OBrien
4
4
4
1
2
3
5
2
Sales
6 90 Koonitz
4
4
5
1
2
2
6
1
Sales
7 70 Rothman
4
6
6
1
1
1
4
3
OLAP Functions
93
Graeme Birchall
Observe above that adding more fields to the ORDER BY phrase resulted in more ranking
values being generated.
Ordering Nulls
When writing the ORDER BY, one can optionally specify whether or not null values should
be counted as high or low. The default, for an ascending field is that they are counted as high
(i.e. come last), and for a descending field, that they are counted as low:
SELECT
id
,years
,salary
,DENSE_RANK()
,DENSE_RANK()
,DENSE_RANK()
,DENSE_RANK()
,DENSE_RANK()
,DENSE_RANK()
FROM
staff
WHERE
id
< 100
ORDER BY years
,salary;
AS yr
OVER(ORDER
OVER(ORDER
OVER(ORDER
OVER(ORDER
OVER(ORDER
OVER(ORDER
BY
BY
BY
BY
BY
BY
years
years
years
years
years
years
ASC)
ASC NULLS
ASC NULLS
DESC)
DESC NULLS
DESC NULLS
AS a
FIRST) AS af
LAST ) AS al
AS d
FIRST) AS df
LAST ) AS dl
ANSWER
==================================
ID YR SALARY
A AF AL D DF DL
-- -- -------- -- -- -- -- -- -30 5 17506.75
1 2 1
6 6 5
90 6 18001.75
2 3 2
5 5 4
40 6 18006.00
2 3 2
5 5 4
70 7 16502.83
3 4 3
4 4 3
10 7 18357.50
3 4 3
4 4 3
20 8 18171.25
4 5 4
3 3 2
50 10 20659.80
5 6 5
2 2 1
80 - 13504.60
6 1 6
1 1 6
60 - 16808.30
6 1 6
1 1 6
The DENSE RANK and RANK functions include null values when calculating rankings. By
contrast the COUNT DISTINCT statement excludes null values when counting values. Thus,
as is illustrated below, the two methods will differ (by one) when they are used get a count of
distinct values - if there are nulls in the target data:
SELECT
ANSWER
=======
Y#1 Y#2
--- --5
6
94
PARTITION Usage
The PARTITION phrase lets one rank the data by subsets of the rows returned. In the following example, the rows are ranked by salary within year:
SELECT
id
,years AS yr
,salary
,RANK() OVER(PARTITION BY years
ORDER
BY salary) AS r1
FROM
staff
WHERE
id
< 80
AND
years IS NOT NULL
ORDER BY years
,salary;
ANSWER
=================
ID YR SALARY
R1
-- -- -------- -30 5 17506.75 1
40 6 18006.00 1
70 7 16502.83 1
10 7 18357.50 2
20 8 18171.25 1
50 0 20659.80 1
id
,years
,salary
,SMALLINT(RANK() OVER(ORDER BY years ASC)) AS rank_a
,SMALLINT(RANK() OVER(ORDER BY years DESC)) AS rank_d
,SMALLINT(RANK() OVER(ORDER BY id, years)) AS rank_iy
FROM
STAFF
WHERE
id
< 100
AND
years IS NOT NULL
ORDER BY years;
If one wants to, one can do some really dumb rankings. All of the examples below are fairly
stupid, but arguably the dumbest of the lot is the last. In this case, the "ORDER BY 1" phrase
ranks the rows returned by the constant "one", so every row gets the same rank. By contrast
the "ORDER BY 1" phrase at the bottom of the query sequences the rows, and so has valid
business meaning:
SELECT
id
,years
,name
,salary
,SMALLINT(RANK() OVER(ORDER
,SMALLINT(RANK() OVER(ORDER
,SMALLINT(RANK() OVER(ORDER
,SMALLINT(RANK() OVER(ORDER
,SMALLINT(RANK() OVER(ORDER
FROM
staff
WHERE
id
< 40
AND
years IS NOT NULL
ORDER BY 1;
BY
BY
BY
BY
BY
SUBSTR(name,3,2)))
salary / 1000))
years * ID))
rand()))
1))
AS
AS
AS
AS
AS
dumb1
dumb2
dumb3
dumb4
dumb5
YEARS
----7
8
5
NAME
-------Sanders
Pernal
Marenghi
SALARY
-------18357.50
18171.25
17506.75
DUMB1
----1
3
2
DUMB2
----3
2
1
DUMB3
----1
3
2
DUMB4
----1
3
2
DUMB5
----1
1
1
OLAP Functions
95
Graeme Birchall
Subsequent Processing
The ranking function gets the rank of the value as of when the function was applied. Subsequent processing may mean that the rank no longer makes sense. To illustrate this point, the
following query ranks the same field twice. Between the two ranking calls, some rows were
removed from the answer set, which has caused the ranking results to differ:
SELECT
xxx.*
,RANK()OVER(ORDER BY id) AS r2
FROM
(SELECT
id
,name
,RANK() OVER(ORDER BY id) AS r1
FROM
staff
WHERE
id
< 100
AND
years IS NOT NULL
)AS xxx
WHERE
id > 30
ORDER BY id;
ANSWER
================
ID NAME
R1 R2
-- ------- -- -40 OBrien 4 1
50 Hanes
5 2
70 Rothman 6 3
90 Koonitz 7 4
One can order the rows based on the output of a ranking function. This can let one sequence
the data in ways that might be quite difficult to do using ordinary SQL. For example, in the
following query the matching rows are ordered so that all those staff with the highest salary in
their respective department come first, followed by those with the second highest salary, and
so on. Within each ranking value, the person with the highest overall salary is listed first:
SELECT
id
,RANK() OVER(PARTITION BY dept
ORDER BY salary DESC) AS r1
,salary
,dept AS dp
FROM
staff
WHERE
id
< 80
AND
years IS NOT NULL
ORDER BY r1
ASC
,salary DESC;
ANSWER
=================
ID R1 SALARY
DP
-- -- -------- -50 1 20659.80 15
10 1 18357.50 20
40 1 18006.00 38
20 2 18171.25 20
30 2 17506.75 38
70 2 16502.83 15
id
,(SELECT COUNT(*)
FROM
staff s2
WHERE s2.id
< 80
AND S2.YEARS IS NOT NULL
AND s2.dept
= s1.dept
AND s2.salary
>= s1.salary) AS R1
,SALARY
,dept AS dp
FROM
staff s1
WHERE
id
< 80
AND
years IS NOT NULL
ORDER BY r1
ASC
,salary DESC;
ANSWER
=================
ID R1 SALARY
DP
-- -- -------- -50 1 20659.80 15
10 1 18357.50 20
40 1 18006.00 38
20 2 18171.25 20
30 2 17506.75 38
70 2 16502.83 15
The nested table expression has to repeat all of the predicates in the main query, and have
predicates that define the ordering sequence. Thus it is hard to read.
The nested table expression will (inefficiently) join every matching row to all prior rows.
96
The ranking functions can also be used to retrieve the row with the highest value in a set of
rows. To do this, one must first generate the ranking in a nested table expression, and then
query the derived field later in the query. The following statement illustrates this concept by
getting the person, or persons, in each department with the highest salary:
SELECT
id
,salary
,dept AS dp
FROM
(SELECT
s1.*
,RANK() OVER(PARTITION BY dept
ORDER BY salary DESC) AS r1
FROM
staff s1
WHERE
id
< 80
AND
years IS NOT NULL
)AS xxx
WHERE
r1 = 1
ORDER BY dp;
ANSWER
==============
ID SALARY
DP
-- -------- -50 20659.80 15
10 18357.50 20
40 18006.00 38
Figure 257, Get highest salary in each department, use RANK function
Here is the same query, written using a correlated sub-query:
SELECT
FROM
WHERE
AND
AND
id
,salary
,dept AS dp
staff s1
id
< 80
years IS NOT NULL
NOT EXISTS
(SELECT *
FROM
staff s2
WHERE s2.id
< 80
AND s2.years IS NOT NULL
AND s2.dept
= s1.dept
AND s2.salary
> s1.salary)
BY DP;
ORDER
ANSWER
==============
ID SALARY
DP
-- -------- -50 20659.80 15
10 18357.50 20
40 18006.00 38
Figure 258, Get highest salary in each department, use correlated sub-query
Here is the same query, written using an uncorrelated sub-query:
SELECT
FROM
WHERE
AND
AND
ORDER
id
,salary
,dept AS dp
staff
id
< 80
years IS NOT NULL
(dept, salary) IN
(SELECT
dept, MAX(salary)
FROM
staff
WHERE
id
< 80
AND
years IS NOT NULL
GROUP BY dept)
BY dp;
ANSWER
==============
ID SALARY
DP
-- -------- -50 20659.80 15
10 18357.50 20
40 18006.00 38
Figure 259, Get highest salary in each department, use uncorrelated sub-query
Arguably, the first query above (i.e. the one using the RANK function) is the most elegant of
the series because it is the only statement where the basic predicates that define what rows
match are written once. With the two sub-query examples, these predicates have to be repeated, which can often lead to errors.
NOTE: If it seems at times that this chapter was written with a poison pen, it is because
just about now I had a "Microsoft moment" and my machine crashed. Needless to say, I
had backups and, needless to say, they got trashed. It took me four days to get back to
where I was. Thanks Bill - may you rot in hell. / Graeme
OLAP Functions
97
Graeme Birchall
The ROW_NUMBER function lets one number the rows being returned. The result is of type
BIGINT. A syntax diagram follows. Observe that unlike with the ranking functions, the ORDER BY is not required:
ROW_NUMBER()
OVER(
,
PARTITION BY
partitioning expression
)
,
ORDER BY
ordering expression
asc option
desc option
You dont have to provide an ORDER BY when using the ROW_NUMBER function, but not
doing so can be considered to be either brave or foolish, depending on ones outlook on life.
To illustrate this issue, consider the following query:
SELECT
id
,name
,ROW_NUMBER() OVER()
AS r1
,ROW_NUMBER() OVER(ORDER BY id) AS r2
FROM
staff
WHERE
id
< 50
AND
years IS NOT NULL
ORDER BY id;
ANSWER
=================
ID NAME
R1 R2
-- -------- -- -10 Sanders
1 1
20 Pernal
2 2
30 Marenghi 3 3
40 OBrien
4 4
id
,name
,ROW_NUMBER() OVER()
AS r1
,ROW_NUMBER() OVER(ORDER BY name) AS r2
FROM
staff
WHERE
id
< 50
AND
years IS NOT NULL
ORDER BY id;
ANSWER
=================
ID NAME
R1 R2
-- -------- -- -10 Sanders
4 4
20 Pernal
3 3
30 Marenghi 1 1
40 OBrien
2 2
id
,name
,ROW_NUMBER()
,ROW_NUMBER()
,ROW_NUMBER()
FROM
staff
WHERE
id
< 50
AND
years IS NOT
ORDER BY id;
OVER()
AS r1
OVER(ORDER BY ID)
AS r2
OVER(ORDER BY NAME) AS r3
NULL
ANSWER
====================
ID NAME
R1 R2 R3
-- -------- -- -- -10 Sanders
1 1 4
20 Pernal
2 2 3
30 Marenghi 3 3 1
40 OBrien
4 4 2
98
The lesson to be learnt here is that the ROW_NUMBER function, when not given an explicit
ORDER BY, may create a value in any odd sequence. Usually, the sequence will reflect the
order in which the rows are returned - but not always.
PARTITION Usage
The PARTITION phrase lets one number the matching rows by subsets of the rows returned.
In the following example, the rows are both ranked and numbered within each JOB:
SELECT
FROM
WHERE
AND
AND
ORDER
job
,years
,id
,name
,ROW_NUMBER() OVER(PARTITION BY job
ORDER
BY years) AS row#
,RANK()
OVER(PARTITION BY job
ORDER
BY years) AS rn1#
,DENSE_RANK() OVER(PARTITION BY job
ORDER
BY years) AS rn2#
staff
id
< 150
years IN (6,7)
ANSWER
job
> L
======================================
BY job
JOB
YEARS ID NAME
ROW# RN1# RN2#
,years;
----- ----- --- ------- ---- ---- ---Mgr
6 140 Fraye
1
1
1
Mgr
7 10 Sanders
2
2
2
Mgr
7 100 Plotz
3
2
2
Sales
6 40 OBrien
1
1
1
Sales
6 90 Koonitz
2
1
1
Sales
7 70 Rothman
3
3
2
To query the output of the ROW_NUMBER function, one has to make a nested temporary
table that contains the function expression. In the following example, this technique is used to
limit the query to the first three matching rows:
SELECT
FROM
*
(SELECT
FROM
WHERE
AND
)AS xxx
WHERE
r <= 3
ORDER BY id;
id
,name
,ROW_NUMBER() OVER(ORDER BY id) AS r
staff
id
< 100
years IS NOT NULL
ANSWER
=============
ID NAME
R
-- -------- 10 Sanders 1
20 Pernal
2
30 Marenghi 3
OLAP Functions
99
Graeme Birchall
SELECT
FROM
WHERE
AND
ORDER
FETCH
id
,name
,ROW_NUMBER() OVER(ORDER BY id) AS r
staff
id
< 100
years IS NOT NULL
BY id
FIRST 3 ROWS ONLY;
ANSWER
=============
ID NAME
R
-- -------- 10 Sanders 1
20 Pernal
2
30 Marenghi 3
*
(SELECT
id
,name
,ROW_NUMBER() OVER(ORDER BY id) AS r
staff
id
< 200
years IS NOT NULL
FROM
WHERE
AND
)AS xxx
WHERE
r BETWEEN 3 AND 6
ORDER BY id;
ANSWER
=============
ID NAME
R
-- -------- 30 Marenghi 3
40 OBrien 4
50 Hanes
5
70 Rothman 6
*
(SELECT
id
,name
,ROW_NUMBER() OVER(ORDER BY id) AS r
staff
id
< 200
years IS NOT NULL
FROM
WHERE
AND
)AS xxx
WHERE
(r - 1) = ((r - 1) / 5) * 5
ORDER BY id;
ANSWER
==============
ID NAME
R
--- ------- -10 Sanders 1
70 Rothman 6
140 Fraye
11
190 Sneider 16
*
(SELECT
FROM
WHERE
AND
)AS xxx
WHERE
r <= 2
ORDER BY id;
id
,name
,ROW_NUMBER() OVER(ORDER BY id DESC) AS r
staff
id
< 200
years IS NOT NULL
ANSWER
==============
ID NAME
R
--- -------- 180 Abrahams 2
190 Sneider 1
Imagine that one wants to fetch the first "n" rows in a query. This is easy to do, and has been
illustrated above. But imagine that one also wants to keep on fetching if the following rows
have the same value as the "nth".
In the next example, we will get the first three matching rows in the STAFF table, ordered by
years of service. However, if the 4th row, or any of the following rows, has the same YEAR
as the 3rd row, then we also want to fetch them.
100
Select every matching row in the STAFF table, and give them all both a row-number and
a ranking value. Both values are assigned according to the order of the final output. Put
the result into a temporary table - TEMP1.
Query the TEMP1 table, getting the ranking of whatever row we want to stop fetching at.
In this case, it is the 3rd row. Put the result into a temporary table - TEMP2.
Finally, join to the two temporary tables. Fetch those rows in TEMP1 that have a ranking
that is less than or equal to the single row in TEMP2.
WITH
temp1(years, id, name, rnk, row) AS
(SELECT years
,id
,name
,RANK()
OVER(ORDER BY years)
,ROW_NUMBER() OVER(ORDER BY years, id)
FROM
staff
WHERE
id
< 200
AND
years IS NOT NULL
),
temp2(rnk) AS
(SELECT rnk
FROM
temp1
WHERE
row = 3
ANSWER
)
==========================
SELECT
temp1.*
YEARS ID NAME
RNK ROW
FROM
temp1
----- --- -------- --- --,temp2
3 180 Abrahams
1
1
WHERE
temp1.rnk <= temp2.rnk
4 170 Kermisch
2
2
ORDER BY years
5 30 Marenghi
3
3
,id;
5 110 Ngan
3
4
Sometimes, one only wants to fetch the first "n" rows, where "n" is small, but the number of
matching rows is extremely large. In this section, we will discus how to obtain these "n" rows
efficiently, which means that we will try to fetch just them without having to process any of
the many other matching rows.
Below is a sample invoice table. Observe that we have defined the INV# field as the primary
key, which means that DB2 will build a unique index on this column:
CREATE TABLE invoice
(inv#
INTEGER
NOT NULL
,customer#
INTEGER
NOT NULL
,sale_date
DATE
NOT NULL
,sale_value DECIMAL(9,2)
NOT NULL
,CONSTRAINT ctx1 PRIMARY KEY (inv#)
,CONSTRAINT ctx2 CHECK(inv# >= 0));
OLAP Functions
101
Graeme Birchall
AS
AS
AS
AS
inv#
customer#
sale_date
sale_value
s.*
,ROW_NUMBER() OVER() AS row#
FROM
invoice s
ORDER BY inv#
FETCH FIRST 5 ROWS ONLY;
Figure 275, Fetch first 5 rows+ number rows - 0.672 elapsed seconds
All of the above queries have processed all 500,000 matching rows, sorted them, and then
fetched the first five. We can do much better if we somehow only process the five rows that
we want to fetch, which is what the next query does:
SELECT
FROM
*
(SELECT
s.*
,ROW_NUMBER() OVER() AS row#
invoice s
FROM
)xxx
WHERE
row# <= 5
ORDER BY inv#;
Figure 276, Process and number 5 rows only - 0.000 elapsed seconds
102
In the above query the "OVER()" phrase told DB2 to assign row numbers in the output order.
In the next query we explicitly provide the row-number sequence, which happens to be the
same at the ORDER BY sequence, but DB2 cant figure that out, so this query costs:
SELECT
FROM
*
(SELECT
s.*
,ROW_NUMBER() OVER(ORDER BY inv#) AS row#
invoice s
FROM
)xxx
WHERE
row# <= 5
ORDER BY inv#;
Figure 277, Process and number 5 rows only - 0.281 elapsed seconds
One can also use recursion to get the first "n" rows. One begins by getting the first matching
row, and then uses that row to get the next, and then the next, and so on (in a recursive join),
until the required number of rows have been obtained.
In the following example, we start by getting the row with the MIN invoice-number. This row
is then joined to the row with the next to lowest invoice-number, which is then joined to the
next, and so on. After five such joins, the cycle is stopped and the result is selected:
WITH temp (inv#, c#, sd, sv, n) AS
(SELECT inv.*
,1
FROM
invoice inv
WHERE
inv# =
(SELECT MIN(inv#)
FROM
invoice)
UNION
ALL
SELECT new.*, n + 1
FROM
temp
old
,invoice new
WHERE
old.inv# < new.inv#
AND
old.n
< 5
AND
new.inv# =
(SELECT MIN(xxx.inv#)
FROM
invoice xxx
WHERE xxx.inv# > old.inv#)
)
SELECT
*
FROM
temp;
It requires all primary predicates (e.g. get only those rows where the sale-value is greater
than $10,000, and the sale-date greater than last month) to be repeated four times. In the
above example there are none, which is unusual in the real world.
It quickly becomes both very complicated and quite inefficient when the sequencing
value is made up of multiple fields. In the above example, we sequenced by the INV#
column, but imagine if we had used the sale-date, sale-value, and customer-number.
In this section we have illustrated how minor changes to the SQL syntax can cause major
changes in query performance. But to illustrate this phenomenon, we used a set of queries
OLAP Functions
103
Graeme Birchall
with 500,000 matching rows. In situations where there are far fewer matching rows, one can
reasonably assume that this problem is not an issue.
Aggregation Function
The various aggregation functions let one do cute things like get cumulative totals or running
averages. In some ways, they can be considered to be extensions of the existing DB2 column
functions. The output type is dependent upon the input type.
column-function
OVER()
OVER(
,
PARTITION BY
partitioning expression
,
asc option
)
ORDER BY
ordering expression
ROWS
UNBOUNDED PRECEDING
RANGE
unsigned-constant PRECEDING
desc option
)
CURRENT ROW
BETWEEN
UNBOUNDED PRECEDING
unsigned-constant PRECEDING
unsigned-constant FOLLOWING
CURRENT ROW
AND
UNBOUNDED FOLLOWING
unsigned-constant PRECEDING
unsigned-constant FOLLOWING
CURRENT ROW
Any DB2 column function (e.g. AVG, SUM, COUNT) can use the aggregation function.
The OVER() usage aggregates all of the matching rows. This is equivalent to getting the
current row, and also applying a column function (e.g. MAX, SUM) against all of the
matching rows (see page 105).
The PARTITION phrase limits any aggregation to a subset of the matching rows.
The ORDER BY phrase has two purposes; It defines a set of values to do aggregations
on. Each distinct value gets a new result. It also defines a direction for the aggregation
function processing - either ascending or descending (see page 106).
104
If an ORDER BY phrase is provided, but neither a RANGE nor ROWS is specified, then
the aggregation is done from the first row to the current row.
The ROWS phrase limits the aggregation result to a set of rows - defined relative to the
current row being processed. The applicable rows can either be already processed (i.e.
preceding) or not yet processed (i.e. following), or both (see page 107).
The RANGE phrase limits the aggregation result to a range of values - defined relative to
the value of the current row being processed. The range is calculated by taking the value
in the current row (defined by the ORDER BY phrase) and adding to and/or subtracting
from it, then seeing what other rows are in the range. For this reason, when RANGE is
used, only one expression can be specified in the aggregation function ORDER BY, and
the expression must be numeric (see page 110).
Preceding rows have already been fetched. Thus, the phrase "ROWS 3 PRECEDING"
refers to the 3 preceding rows - plus the current row. The phrase "UNBOUNDED
PRECEDING" refers to all those rows (in the partition) that have already been fetched,
plus the current one.
Following rows have yet to be fetched. The phrase "UNBOUNDED FOLLOWING" refers to all those rows (in the partition) that have yet to be fetched, plus the current one.
The phrase CURRENT ROW refers to the current row. It is equivalent to getting zero
preceding and following rows.
If either a ROWS or a RANGE phrase is used, but no BETWEEN is provided, then one
must provide a starting point for the aggregation (e.g. ROWS 1 PRECEDING). The starting point must either precede or equal the current row - it cannot follow it. The implied
end point is the current row.
When using the BETWEEN phrase, put the "low" value in the first check and the "high"
value in the second check. Thus one can go from the 1 PRECEDING to the CURRENT
ROW, or from the CURRENT ROW to 1 FOLLOWING, but not the other way round.
The set of rows that match the BETWEEN phrase differ depending upon whether the
aggregation function ORDER BY is ascending or descending.
Basic Usage
In its simplest form, with just an "OVER()" phrase, an aggregation function works on all of
the matching rows, running the column function specified. Thus, one gets both the detailed
data, plus the SUM, or AVG, or whatever, of all the matching rows.
In the following example, five rows are selected from the STAFF table. Along with various
detailed fields, the query also gets sum summary data about the matching rows:
SELECT
id
,name
,salary
,SUM(salary)
,AVG(salary)
,MIN(salary)
,MAX(salary)
,COUNT(*)
FROM
staff
WHERE
id < 60
ORDER BY id;
OVER()
OVER()
OVER()
OVER()
OVER()
AS
AS
AS
AS
AS
sum_sal
avg_sal
min_sal
max_sal
#rows
OLAP Functions
105
Graeme Birchall
NAME
-------Sanders
Pernal
Marenghi
OBrien
Hanes
SALARY
-------18357.50
18171.25
17506.75
18006.00
20659.80
SUM_SAL
-------92701.30
92701.30
92701.30
92701.30
92701.30
AVG_SAL
-------18540.26
18540.26
18540.26
18540.26
18540.26
MIN_SAL
-------17506.75
17506.75
17506.75
17506.75
17506.75
MAX_SAL
-------20659.80
20659.80
20659.80
20659.80
20659.80
#ROWS
----5
5
5
5
5
id
,name
,salary
,SUM(salary)
,SUM(salary)
,SUM(salary)
,SUM(salary)
OVER()
OVER(ORDER
OVER(ORDER
OVER(ORDER
RANGE
AS sum1
BY id * 0)
AS sum2
BY ABC)
AS sum3
BY ABC
BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS sum4
FROM
staff
WHERE
id < 60
ORDER BY id;
NAME
-------Sanders
Pernal
Marenghi
OBrien
Hanes
SALARY
-------18357.50
18171.25
17506.75
18006.00
20659.80
SUM1
-------92701.30
92701.30
92701.30
92701.30
92701.30
SUM2
-------92701.30
92701.30
92701.30
92701.30
92701.30
SUM3
-------92701.30
92701.30
92701.30
92701.30
92701.30
SUM4
-------92701.30
92701.30
92701.30
92701.30
92701.30
106
It provides a set of values to do aggregations on. Each distinct value gets a new result.
In the next query, various aggregations are done on the DEPT field, which is not unique, and
on the DEPT and NAME fields combined, which are unique (for these rows). Both ascending
and descending aggregations are illustrated:
SELECT
dept
,name
,salary
,SUM(salary)
,SUM(salary)
,SUM(salary)
,SUM(salary)
,COUNT(*)
,COUNT(*)
FROM
staff
WHERE
id < 60
ORDER BY dept
,name;
OVER(ORDER
OVER(ORDER
OVER(ORDER
OVER(ORDER
OVER(ORDER
OVER(ORDER
BY
BY
BY
BY
BY
BY
dept)
AS sum1
dept DESC)
AS sum2
dept, NAME)
AS sum3
dept DESC, name DESC) AS sum4
dept)
AS row1
dept, NAME)
AS row2
NAME
-------Hanes
Pernal
Sanders
Marenghi
OBrien
SALARY
-------20659.80
18171.25
18357.50
17506.75
18006.00
SUM1
-------20659.80
57188.55
57188.55
92701.30
92701.30
SUM2
-------92701.30
72041.50
72041.50
35512.75
35512.75
SUM3
-------20659.80
38831.05
57188.55
74695.30
92701.30
SUM4
ROW1 ROW2
-------- ---- ---92701.30
1
1
72041.50
3
2
53870.25
3
3
35512.75
5
4
18006.00
5
5
The ROWS phrase can be used to limit the aggregation function to a subset of the matching
rows or distinct values. If no ROWS or RANGE phrase is provided, the aggregation is done
for all preceding rows, up to the current row. Likewise, if no BETWEEN phrase is provided,
the aggregation is done from the start-location given, up to the current row. In the following
query, all of the examples using the ROWS phrase are of this type:
SELECT
dept
,name
,years
,SMALLINT(SUM(years) OVER(ORDER BY dept))
AS
,SMALLINT(SUM(years) OVER(ORDER BY dept, name))
AS
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS
UNBOUNDED PRECEDING))AS
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS
3 PRECEDING))
AS
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS
1 PRECEDING))
AS
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS
0 PRECEDING))
AS
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS
CURRENT ROW))
AS
,SMALLINT(SUM(years) OVER(ORDER BY dept DESC, name DESC
ROWS
1 PRECEDING))
AS
FROM
staff
WHERE
id
< 100
AND
years IS NOT NULL
ORDER BY dept
,name;
d
dn
dnu
dn3
dn1
dn0
dnc
dnx
Figure 287, Starting ROWS usage. Implied end is current row, SQL
OLAP Functions
107
Graeme Birchall
Below is the answer. Observe that an aggregation starting at the current row, or including
zero proceeding rows, doesnt aggregate anything other than the current row:
DEPT
---15
15
20
20
38
38
42
NAME
-------Hanes
Rothman
Pernal
Sanders
Marenghi
OBrien
Koonitz
YEARS
----10
7
8
7
5
6
6
D
-17
17
32
32
43
43
49
DN
-10
17
25
32
37
43
49
DNU
--10
17
25
32
37
43
49
DN3
--10
17
25
32
27
26
24
DN1
--10
17
15
15
12
11
12
DN0
--10
7
8
7
5
6
6
DNC
--10
7
8
7
5
6
6
DNX
--17
15
15
12
11
12
6
Figure 288, Starting ROWS usage. Implied end is current row, Answer
BETWEEN Usage
In the next query, the BETWEEN phrase is used to explicitly define the start and end rows
that are used in the aggregation:
SELECT
dept
,name
,years
,SMALLINT(SUM(years) OVER(ORDER BY dept, name))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS
UNBOUNDED PRECEDING))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN CURRENT ROW
AND CURRENT ROW))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN 2 PRECEDING
AND 2 FOLLOWING))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN 3 PRECEDING
AND 3 FOLLOWING))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING))
,SMALLINT(SUM(years) OVER(ORDER BY dept, name
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING))
FROM
staff
WHERE
id
< 100
AND
years IS NOT NULL
ORDER BY dept
,name;
AS uc1
AS uc2
AS uc3
AS cu1
AS pf1
AS pf2
AS pf3
AS cu1
AS uu1
NAME
-------Hanes
Rothman
Pernal
Sanders
Marenghi
OBrien
Koonitz
YEARS
----10
7
8
7
5
6
6
UC1
--10
17
25
32
37
43
49
UC2
--10
17
25
32
37
43
49
UC3
--10
17
25
32
37
43
49
CU1
--10
7
8
7
5
6
6
PF1
--17
25
22
20
18
17
12
PF2
--25
32
37
33
32
24
17
PF3
--32
37
43
49
39
32
24
CU1
--49
39
32
24
17
12
6
UU1
--49
49
49
49
49
49
49
108
The BETWEEN predicate in an ordinary SQL statement is used to get those rows that have a
value between the specified low-value (given first) and the high value (given last). Thus the
predicate "BETWEEN 5 AND 10" may find rows, but the predicate "BETWEEN 10 AND 5"
will never find any.
The BETWEEN phrase in an aggregation function has a similar usage in that it defines the set
of rows to be aggregated. But it differs in that the answer depends upon the function ORDER
BY sequence, and a non-match returns a null value, not no-rows.
Below is some sample SQL. Observe that the first two aggregations are ascending, while the
last two are descending:
SELECT
id
,name
,SMALLINT(SUM(id) OVER(ORDER BY id ASC
ROWS BETWEEN 1 PRECEDING
AND CURRENT ROW)) AS apc
,SMALLINT(SUM(id) OVER(ORDER BY id ASC
ROWS BETWEEN CURRENT ROW
AND 1 FOLLOWING)) AS acf
,SMALLINT(SUM(id) OVER(ORDER BY id DESC
ROWS BETWEEN 1 PRECEDING
AND CURRENT ROW)) AS dpc
,SMALLINT(SUM(id) OVER(ORDER BY id DESC
ROWS BETWEEN CURRENT ROW
AND 1 FOLLOWING)) AS dcf
FROM
staff
WHERE
id
< 50
AND
years IS NOT NULL
ANSWER
ORDER BY id;
===========================
ID NAME
APC ACF DPC DCF
-- -------- --- --- --- --10 Sanders
10 30 30 10
20 Pernal
30 50 50 30
30 Marenghi 50 70 70 50
40 OBrien
70 40 40 70
1ST-ROW
========
10=10
10+20=30
2ND-ROW
========
10+20=30
20+30=50
3RD-ROW
========
20+30=40
30+40=70
4TH-ROW
========
30+40=70
40
=40
DESC id (40,30,20,10)
READ ROWS, RIGHT to LEFT
==========================
1 PRECEDING to CURRENT ROW
CURRENT ROW to 1 FOLLOWING
1ST-ROW
========
20+10=30
10
=10
2ND-ROW
========
30+20=50
20+10=30
3RD-ROW
========
40+30=70
30+20=50
4TH-ROW
========
40
=40
40+30=70
OLAP Functions
109
Graeme Birchall
RANGE Usage
The RANGE phrase limits the aggregation result to a range of numeric values - defined relative to the value of the current row being processed. The range is obtained by taking the value
in the current row (defined by the ORDER BY expression) and adding to and/or subtracting
from it, then seeing what other rows are in the range. Note that only one expression can be
specified in the ORDER BY, and that expression must be numeric.
In the following example, the RANGE function adds to and/or subtracts from the DEPT field.
For example, in the function that is used to populate the RG10 field, the current DEPT value
is checked against the preceding DEPT values. If their value is within 10 digits of the current
value, the related YEARS field is added to the SUM:
SELECT
dept
,name
,years
,SMALLINT(SUM(years) OVER(ORDER BY
ROWS BETWEEN
AND
,SMALLINT(SUM(years) OVER(ORDER BY
ROWS BETWEEN
AND
,SMALLINT(SUM(years) OVER(ORDER BY
RANGE BETWEEN
AND
,SMALLINT(SUM(years) OVER(ORDER BY
RANGE BETWEEN
AND
,SMALLINT(SUM(years) OVER(ORDER BY
RANGE BETWEEN
AND
,SMALLINT(SUM(years) OVER(ORDER BY
RANGE BETWEEN
AND
,SMALLINT(SUM(years) OVER(ORDER BY
RANGE BETWEEN
AND
FROM
staff
WHERE
id
< 100
AND
years IS NOT NULL
ORDER BY dept
,name;
dept
1 PRECEDING
CURRENT ROW))
dept
2 PRECEDING
CURRENT ROW))
dept
1 PRECEDING
CURRENT ROW))
dept
10 PRECEDING
CURRENT ROW))
dept
20 PRECEDING
CURRENT ROW))
dept
10 PRECEDING
20 FOLLOWING))
dept
CURRENT ROW
20 FOLLOWING))
AS row1
AS row2
AS rg01
AS rg10
AS rg20
AS rg11
AS rg99
NAME
------Hanes
Rothman
Pernal
Sanders
Marengh
OBrien
Koonitz
YEARS
----10
7
8
7
5
6
6
ROW1
---10
17
15
15
12
11
12
ROW2
---10
17
25
22
20
18
17
RG01
---17
17
15
15
11
11
6
RG10
---17
17
32
32
11
11
17
RG20
---17
17
32
32
26
26
17
RG11
---32
32
43
43
17
17
17
RG99
---32
32
26
26
17
17
6
The ROWS expression refers to the "n" rows before and/or after (within the partition), as
defined by the ORDER BY.
The RANGE expression refers to those before and/or after rows (within the partition) that
are within an arithmetic range of the current row.
110
PARTITION Usage
One can take all of the lovely stuff described above, and make it whole lot more complicated
by using the PARTITION expression. This phrase limits the current processing of the aggregation to a subset of the matching rows.
In the following query, some of the aggregation functions are broken up by partition range
and some are not. When there is a partition, then the ROWS check only works within the
range of the partition (i.e. for a given DEPT):
SELECT
dept
,name
,years
,SMALLINT(SUM(years) OVER(ORDER
BY dept))
,SMALLINT(SUM(years) OVER(ORDER
BY dept
ROWS 3 PRECEDING))
,SMALLINT(SUM(years) OVER(ORDER
BY dept
ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING))
,SMALLINT(SUM(years) OVER(PARTITION BY dept))
,SMALLINT(SUM(years) OVER(PARTITION BY dept
ORDER
BY dept))
,SMALLINT(SUM(years) OVER(PARTITION BY dept
ORDER
BY dept
ROWS 1 PRECEDING))
,SMALLINT(SUM(years) OVER(PARTITION BY dept
ORDER
BY dept
ROWS 3 PRECEDING))
,SMALLINT(SUM(years) OVER(PARTITION BY dept
ORDER
BY dept
ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING))
FROM
staff
WHERE
id BETWEEN 40 AND 120
AND
years IS NOT NULL
ORDER BY dept
,name;
AS x
AS xo3
AS xo11
AS p
AS po
AS po1
AS po3
AS po11
NAME
------Hanes
Ngan
Rothman
OBrien
Koonitz
Plotz
YEARS
----10
5
7
6
6
7
X
---22
22
22
28
41
41
XO3
---10
15
22
28
24
26
XO11
---15
22
18
19
19
13
P
---22
22
22
6
13
13
PO
---22
22
22
6
13
13
PO1
---10
15
12
6
6
13
PO3
---10
15
22
6
6
13
PO11
---15
22
12
6
13
13
The PARTITION clause, when used by itself, returns a very similar result to a GROUP BY,
except that it does not remove the duplicate rows. To illustrate, below is a simple query that
does a GROUP BY:
SELECT
dept
,SUM(years) AS sum
,AVG(years) AS avg
,COUNT(*)
AS row
FROM
staff
WHERE
id BETWEEN 40 AND 120
AND
years IS NOT NULL
GROUP BY dept;
ANSWER
================
DEPT SUM AVG ROW
---- --- --- --15 22
7
3
38
6
6
1
42 13
6
2
OLAP Functions
111
Graeme Birchall
Below is a similar query that uses the PARTITION phrase. Observe that the answer is the
same, except that duplicate rows have not been removed:
SELECT
dept
,SUM(years) OVER(PARTITION BY dept) AS sum
,AVG(years) OVER(PARTITION BY dept) AS avg
,COUNT(*)
OVER(PARTITION BY dept) AS row
FROM
staff
WHERE
id BETWEEN 40 AND 120
AND
years IS NOT NULL
ORDER BY dept;
ANSWER
=================
DEPT SUM AVG ROW
----- --- --- --15 22
7
3
15 22
7
3
15 22
7
3
38
6
6
1
42 13
6
2
42 13
6
2
DISTINCT dept
,SUM(years) OVER(PARTITION BY dept) AS sum
,AVG(years) OVER(PARTITION BY dept) AS avg
,COUNT(*)
OVER(PARTITION BY dept) AS row
FROM
staff
WHERE
id BETWEEN 40 AND 120
AND
years IS NOT NULL
ORDER BY dept;
ANSWER
=================
DEPT SUM AVG ROW
----- --- --- --15 22
7
3
38
6
6
1
42 13
6
2
112
Scalar Functions
Introduction
Scalar functions act on a single row at a time. In this section we shall list all of the ones that
come with DB2 and look in detail at some of the more interesting ones. Refer to the SQL
Reference for information on those functions not fully described here.
WARNING: Some of the scalar functions changed their internal logic between V5 and V6
of DB2. There have been no changes between V6 and V7, nor between V7 and V8, except for the addition of a few more functions.
Sample Data
The following self-defined view will be used throughout this section to illustrate how some of
the following functions work. Observe that the view has a VALUES expression that defines
the contents- three rows and nine columns.
CREATE VIEW scalar (d1,f1,s1,c1,v1,ts1,dt1,tm1,tc1) AS
WITH temp1 (n1, c1, t1) AS
(VALUES (-2.4,ABCDEF,1996-04-22-23.58.58.123456)
,(+0.0,ABCD ,1996-08-15-15.15.15.151515)
,(+1.8,AB
,0001-01-01-00.00.00.000000))
SELECT DECIMAL(n1,3,1)
,DOUBLE(n1)
,SMALLINT(n1)
,CHAR(c1,6)
,VARCHAR(RTRIM(c1),6)
,TIMESTAMP(t1)
,DATE(t1)
,TIME(t1)
,CHAR(t1)
FROM
temp1;
F1
---------2.4e+000
0.0e+000
1.8e+000
DT1
---------1996-04-22
1996-08-15
0001-01-01
TM1
-------23:58:58
15:15:15
00:00:00
S1
--2
0
1
C1
-----ABCDEF
ABCD
AB
V1
-----ABCDEF
ABCD
AB
TS1
-------------------------1996-04-22-23.58.58.123456
1996-08-15-15.15.15.151515
0001-01-01-00.00.00.000000
TC1
-------------------------1996-04-22-23.58.58.123456
1996-08-15-15.15.15.151515
0001-01-01-00.00.00.000000
Returns the absolute value of a number (e.g. -0.4 returns + 0.4). The output field type will
equal the input field type (i.e. double input returns double output).
Scalar Functions
113
Graeme Birchall
SELECT d1
,ABS(D1)
,f1
,ABS(f1)
FROM
scalar;
AS
AS
AS
AS
d1
d2
f1
f2
Returns the arccosine of the argument as an angle expressed in radians. The output format is
double.
ASCII
Returns the ASCII code value of the leftmost input character. Valid input types are any valid
character type up to 1 MEG. The output type is integer.
SELECT c1
,ASCII(c1)
AS ac1
,ASCII(SUBSTR(c1,2)) AS ac2
FROM
scalar
WHERE c1 = ABCDEF;
ANSWER
================
C1
AC1 AC2
------ --- --ABCDEF
65
66
Returns the arcsine of the argument as an angle expressed in radians. The output format is
double.
ATAN
Returns the arctangent of the argument as an angle expressed in radians. The output format is
double.
ATANH
Returns the hyperbolic acrctangent of the argument, where the argument is and an angle expressed in radians. The output format is double.
ATAN2
Returns the arctangent of x and y coordinates, specified by the first and second arguments, as
an angle, expressed in radians. The output format is double.
BIGINT
Converts the input value to bigint (big integer) format. The input can be either numeric or
character. If character, it must be a valid representation of a number.
114
ANSWER
====================
BIG
-------------------1
256
65536
16777216
4294967296
1099511627776
281474976710656
72057594037927936
DECIMAL1
BIGINT1
------------------- -------------------1.
1
123.
123
12345.
12345
1234567.
1234567
123456789.
123456788
12345678900.
12345678899
1234567890000.
1234567889999
123456789000000.
123456788999999
12345678900000000.
12345678899999996
1234567890000000000. 1234567889999999488
Converts the input (1st argument) to a blob. The output length (2nd argument) is optional.
BLOB (
string-expression
)
, length
Returns the next smallest integer value that is greater than or equal to the input (e.g. 5.045
returns 6.000). The output field type will equal the input field type.
CEIL or CEILING (
numeric-expression
Scalar Functions
115
Graeme Birchall
SELECT d1
,CEIL(d1) AS d2
,f1
,CEIL(f1) AS f2
FROM
scalar;
CHAR
The CHAR function has a multiplicity of uses. The result is always a fixed-length character
value, but what happens to the input along the way depends upon the input type:
For character input, the CHAR function acts a bit like the SUBSTR function, except that
it can only truncate starting from the left-most character. The optional length parameter,
if provided, must be a constant or keyword.
Date-time input is converted into an equivalent character string. Optionally, the external
format can be explicitly specified (i.e. ISO, USA, EUR, JIS, or LOCAL).
Decimal input is converted into a right-justified character string with leading zeros. The
format of the decimal point can optionally be provided. The default decimal point is a
dot. The + and - symbols are not allowed as they are used as sign indicators.
character value
date-time value
integer value
)
, length
, format
double value
decimal value
, dec.pt
name
,CHAR(name,3)
,comm
,CHAR(comm)
,CHAR(comm,@)
FROM
staff
WHERE
id BETWEEN 80
AND 100
ORDER BY id;
ANSWER
=====================================
NAME
2
COMM
4
5
------- --- ------- -------- -------James
Jam 128.20 00128.20 00128@20
Koonitz Koo 1386.70 01386.70 01386@70
Plotz
Plo
- -
116
ANSWER
==========================================
INT
CHAR_INT CHAR_FLT
CHAR_DEC
-------- -------- ----------- -----------3 3
3.0E0
00000000003.
9 9
9.0E0
00000000009.
81 81
8.1E1
00000000081.
6561 6561
6.561E3
00000006561.
43046721 43046721 4.3046721E7 00043046721.
AS
AS
AS
AS
int
char_int
char_flt
char_dec
ANSWER
===================================
N1
I1
I2
D1
D2
------ ----- ------ ------- ------3 3
+3
00003. +00003.
-21 -21
-21
-00021. -00021.
147 147
+147
00147. +00147.
-1029 -1029 -1029 -01029. -01029.
7203 7203 +7203 07203. +07203.
CHAR(n1)
CHAR(n1)
CHAR(DEC(n1))
CHAR(DEC(n1))
ANSWER
================================
1
2
3
---------- ---------- ---------1972-02-12 02/12/1972 12.02.1972
1966-03-03 03/03/1966 03.03.1966
Numeric input can be converted to character using either the DIGITS or the CHAR function,
though the former does not support float. Both functions work differently, and neither gives
Scalar Functions
117
Graeme Birchall
perfect output. The CHAR function doesnt properly align up positive and negative numbers,
while the DIGITS function looses both the decimal point and sign indicator:
SELECT
d2
,CHAR(d2)
AS cd2
,DIGITS(d2) AS dd2
FROM
(SELECT DEC(d1,4,1) AS d2
FROM
scalar
)AS xxx
ORDER BY 1;
ANSWER
================
D2
CD2
DD2
---- ------ ----2.4 -002.4 0024
0.0 000.0 0000
1.8 001.8 0018
CHR
Converts integer input in the range 0 through 255 to the equivalent ASCII character value. An
input value above 255 returns 255. The ASCII function (see above) is the inverse of the CHR
function.
SELECT A
,ASCII(A)
,CHR(ASCII(A))
,CHR(333)
FROM
staff
WHERE id = 10;
AS
AS
AS
AS
"c"
"c>n"
"c>n>c"
"nl"
ANSWER
=================
C C>N C>N>C NL
- --- ----- -A
65 A
CLOB
Converts the input (1st argument) to a CLOB. The output length (2nd argument) is optional.
If the input is truncated during conversion, a warning message is issued. For example, in the
following example the second CLOB statement will induce a warning for the first two lines of
input because they have non-blank data after the third byte:
SELECT c1
,CLOB(c1)
AS cc1
,CLOB(c1,3) AS cc2
FROM
scalar;
ANSWER
===================
C1
CC1
CC2
------ ------ --ABCDEF ABCDEF ABC
ABCD
ABCD
ABC
AB
AB
AB
COALESCE
Returns the first non-null value in a list of input expressions (reading from left to right). Each
expression is separated from the prior by a comma. All input expressions must be compatible.
VALUE is a synonym for COALESCE.
118
SELECT
id
,comm
,COALESCE(comm,0)
FROM
staff
WHERE
id < 30
ORDER BY id;
ANSWER
==================
ID COMM
3
-- ------ -----10
0.00
20 612.45 612.45
ANSWER
========
CC1 CC2
--- --10
10
ANSWER
===================
#ROWS MIN_ID CCC_ID
----- ------ -----0
-1
Joins two strings together. The CONCAT function has both "infix" and "prefix" notations. In
the former case, the verb is placed between the two strings to be acted upon. In the latter case,
the two strings come after the verb. Both syntax flavours are illustrated below:
SELECT
FROM
WHERE
A || B
,A CONCAT B
,CONCAT(A,B)
,A || B || C
,CONCAT(CONCAT(A,B),C)
staff
id = 10;
ANSWER
===================
1
2
3
4
5
--- --- --- --- --AB AB AB ABC ABC
When ordinary character fields are concatenated, any blanks at the end of the first field are
left in place. By contrast, concatenating varchar fields removes any (implied) trailing blanks.
If the result of the second type of concatenation is then used in an ORDER BY, the resulting
row sequence will probably be not what the user intended. To illustrate:
Scalar Functions
119
Graeme Birchall
ANSWER
===============
COL1 COL2 COL3
---- ---- ----AE
OOO AEOOO
AE
YYY AEYYY
A
YYY AYYY
ANSWER
===============
COL1 COL2 COL3
---- ---- ----A
YYY A YYY
AE
OOO AEOOO
AE
YYY AEYYY
COS
Returns the cosine of the argument where the argument is an angle expressed in radians. The
output format is double.
WITH temp1(n1) AS
(VALUES (0)
UNION ALL
SELECT n1 + 10
FROM
temp1
WHERE
n1 < 90)
SELECT n1
,DEC(RADIANS(n1),4,3)
AS ran
,DEC(COS(RADIANS(n1)),4,3) AS cos
,DEC(SIN(RADIANS(n1)),4,3) AS sin
FROM
temp1;
ANSWER
=======================
N1 RAN
COS
SIN
-- ----- ----- ----0 0.000 1.000 0.000
10 0.174 0.984 0.173
20 0.349 0.939 0.342
30 0.523 0.866 0.500
40 0.698 0.766 0.642
50 0.872 0.642 0.766
60 1.047 0.500 0.866
70 1.221 0.342 0.939
80 1.396 0.173 0.984
90 1.570 0.000 1.000
Returns the hyperbolic cosine for the argument, where the argument is an angle expressed in
radians. The output format is double.
COT
Returns the cotangent of the argument where the argument is an angle expressed in radians.
The output format is double.
120
DATE
Converts the input into a date value. The nature of the conversion process depends upon the
input type and length:
Char or varchar input that is a valid string representation of a date or a timestamp (e.g.
"1997-12-23") is converted as is.
Char or varchar input that is seven bytes long is assumed to be a Julian date value in the
format yyyynnn where yyyy is the year and nnn is the number of days since the start of
the year (in the range 001 to 366).
Numeric input is assumed to have a value which represents the number of days since the
date "0001-01-01" inclusive. All numeric types are supported, but the fractional part of a
value is ignored (e.g. 12.55 becomes 12 which converts to "0001-01-12").
DATE (
expression
ANSWER
======================================
TS1
DT1
-------------------------- ---------1996-04-22-23.58.58.123456 1996-04-22
1996-08-15-15.15.15.151515 1996-08-15
0001-01-01-00.00.00.000000 0001-01-01
ANSWER
===================
N1
D1
------- ---------1 0001-01-01
728000 1994-03-13
730120 2000-01-01
Returns the day (as in day of the month) part of a date (or equivalent) value. The output format is integer.
SELECT dt1
,DAY(dt1) AS day1
FROM
scalar
WHERE DAY(dt1) > 10;
ANSWER
================
DT1
DAY1
---------- ---1996-04-22
22
1996-08-15
15
Scalar Functions
121
Graeme Birchall
SELECT
dt1
,DAY(dt1)
AS day1
,dt1 -1996-04-30
AS dur2
,DAY(dt1 -1996-04-30) AS day2
FROM
scalar
WHERE
DAY(dt1) > 10
ORDER BY dt1;
ANSWER
=========================
DT1
DAY1 DUR2 DAY2
---------- ---- ---- ---1996-04-22
22 -8.
-8
1996-08-15
15 315.
15
DAYNAME
Returns the name of the day (e.g. Friday) as contained in a date (or equivalent) value. The
output format is varchar(100).
SELECT dt1
,DAYNAME(dt1)
AS dy1
,LENGTH(DAYNAME(dt1)) AS dy2
FROM
scalar
WHERE DAYNAME(dt1) LIKE %a%y
ORDER BY dt1;
ANSWER
========================
DT1
DY1
DY2
---------- ------- --0001-01-01 Monday
6
1996-04-22 Monday
6
1996-08-15 Thursday
8
Returns a number that represents the day of the week (where Sunday is 1 and Saturday is 7)
from a date (or equivalent) value. The output format is integer.
SELECT
dt1
,DAYOFWEEK(dt1) AS dwk
,DAYNAME(dt1)
AS dnm
FROM
scalar
ORDER BY dwk
,dnm;
ANSWER
=========================
DT1
DWK DNM
---------- --- -------0001-01-01
2 Monday
1996-04-22
2 Monday
1996-08-15
5 Thursday
Returns an integer value that represents the day of the "ISO" week. An ISO week differs from
an ordinary week in that it begins on a Monday (i.e. day-number = 1) and it neither ends nor
begins at the exact end of the year. Instead, the final ISO week of the prior year will continue
into the new year. This often means that the first days of the year have an ISO week number
of 52, and that one gets more than seven days in a year for ISO week 52.
122
WITH
temp1 (n) AS
(VALUES (0)
UNION ALL
SELECT n+1
FROM
temp1
WHERE n < 9),
temp2 (dt1) AS
(VALUES(DATE(1999-12-25))
,(DATE(2000-12-24))),
temp3 (dt2) AS
(SELECT dt1 + n DAYS
FROM
temp1
,temp2)
SELECT
CHAR(dt2,ISO)
,SUBSTR(DAYNAME(dt2),1,3)
,WEEK(dt2)
,DAYOFWEEK(dt2)
,WEEK_ISO(dt2)
,DAYOFWEEK_ISO(dt2)
FROM
temp3
ORDER BY 1;
AS
AS
AS
AS
AS
AS
date
day
w
d
wi
i
ANSWER
========================
DATE
DAY W D WI I
---------- --- -- - -- 1999-12-25 Sat 52 7 51 6
1999-12-26 Sun 53 1 51 7
1999-12-27 Mon 53 2 52 1
1999-12-28 Tue 53 3 52 2
1999-12-29 Wed 53 4 52 3
1999-12-30 Thu 53 5 52 4
1999-12-31 Fri 53 6 52 5
2000-01-01 Sat 1 7 52 6
2000-01-02 Sun 2 1 52 7
2000-01-03 Mon 2 2 1 1
2000-12-24 Sun 53 1 51 7
2000-12-25 Mon 53 2 52 1
2000-12-26 Tue 53 3 52 2
2000-12-27 Wed 53 4 52 3
2000-12-28 Thu 53 5 52 4
2000-12-29 Fri 53 6 52 5
2000-12-30 Sat 53 7 52 6
2000-12-31 Sun 54 1 52 7
2001-01-01 Mon 1 2 1 1
2001-01-02 Tue 1 3 1 2
Returns a number that is the day of the year (from 1 to 366) from a date (or equivalent) value.
The output format is integer.
SELECT
dt1
,DAYOFYEAR(dt1) AS dyr
FROM
scalar
ORDER BY dyr;
ANSWER
===============
DT1
DYR
---------- --0001-01-01
1
1996-04-22 113
1996-08-15 228
Converts a date (or equivalent) value into a number that represents the number of days since
the date "0001-01-01" inclusive. The output format is INTEGER.
SELECT
dt1
,DAYS(dt1) AS dy1
FROM
scalar
ORDER BY dy1
,dt1;
ANSWER
==================
DT1
DY1
---------- -----0001-01-01
1
1996-04-22 728771
1996-08-15 728886
Converts the input (1st argument) to a dbclob. The output length (2nd argument) is optional.
Scalar Functions
123
Graeme Birchall
DBPARTITIONNUM
Returns the partition number of the row. The result is zero if the table is not partitioned. The
output is of type integer, and is never null.
DBPARTITIONNUM
column-name
DBPARTITIONNUM(id) AS dbnum
staff
id = 10;
ANSWER
======
DBNUM
----0
Converts either character or numeric input to decimal. When the input is of type character, the
decimal point format can be specified.
DECIMAL
number
)
, precision
DEC
, scale
(
char
)
, precision
, scale
, dec
ANSWER
==========================
DEC1 DEC2
DEC3
DEC4
----- ------ ------ -----123. 100.0 123.4 567.8
DEGREES
Returns the number of degrees converted from the argument as expressed in radians. The output format is double.
DEREF
124
Decrypts data that has been encrypted using the ENCRYPT function. Use the BIN function to
decrypt binary data (e.g. BLOBS, CLOBS) and the CHAR function to do character data. Numeric data cannot be encrypted.
DECRYPT_BIN
encrypted data
)
, password
DECRYPT_CHAR
id
,name
,DECRYPT_CHAR(name2,CLUELESS)
AS name3
,GETHINT(name2)
AS hint
,name2
FROM
(SELECT id
,name
,ENCRYPT(name,CLUELESS,MY BOSS) AS name2
FROM
staff
WHERE id < 30
)AS xxx
ORDER BY id;
Returns the difference between the sounds of two strings as determined using the SOUNDEX
function. The output (of type integer) ranges from 4 (good match) to zero (poor match).
SELECT
FROM
WHERE
AND
AND
ORDER
a.name
,SOUNDEX(a.name)
,b.name
,SOUNDEX(b.name)
,DIFFERENCE
(a.name,b.name)
staff a
,staff b
a.id = 10
b.id > 150
b.id < 250
BY df DESC
,n2 ASC;
AS
AS
AS
AS
n1
s1
n2
s2
AS df
ANSWER
==============================
N1
S1
N2
S2
DF
------- ---- --------- ---- -Sanders S536 Sneider
S536 4
Sanders S536 Smith
S530 3
Sanders S536 Lundquist L532 2
Sanders S536 Daniels
D542 1
Sanders S536 Molinare M456 1
Sanders S536 Scoutten S350 1
Sanders S536 Abrahams A165 0
Sanders S536 Kermisch K652 0
Sanders S536 Lu
L000 0
DIGITS
Converts an integer or decimal value into a character string with leading zeros. Both the sign
indicator and the decimal point are lost in the translation.
SELECT s1
,DIGITS(s1) AS ds1
,d1
,DIGITS(d1) AS dd1
FROM
scalar;
ANSWER
=========================
S1
DS1
D1
DD1
------ ----- ----- ---2 00002
-2.4 024
0 00000
0.0 000
1 00001
1.8 018
Scalar Functions
125
Graeme Birchall
The CHAR function can sometimes be used as alternative to the DIGITS function. Their output differs slightly - see page 337 for a comparison.
NOTE: Neither the DIGITS nor the CHAR function do a great job of converting numbers to
characters. See page 337 for some user-defined functions that can be used instead.
DLCOMMENT
Returns a DATALINK value which has an attribute indicating that the referenced file has
changed.
DLPREVIOUSCOPY
Returns a DATALINK value which has an attribute indicating that the previous version of the
file should be restored.
DLREPLACECONTENT
Returns a DATALINK value. When the function is used in an UPDATE or INSERT the contents of the target file is replaced by another.
DLURLCOMPLETE
Returns the URL value from a DATALINK value with a link type of URL.
DLURLCOMPLETEONLY
Returns the data location attribute from a DATALINK value with a link type of URL.
DLURLCOMPLETEWRITE
Returns the complete URL value from a DATALINK value with a link type of URL.
DLURLPATH
Returns the path and file name necessary to access a file within a given server from a DATALINK value with linktype of URL.
DLURLPATHONLY
Returns the path and file name necessary to access a file within a given server from a DATALINK value with a linktype of URL. The value returned never includes a file access token.
DLURLPATHWRITE
Returns the path and file name necessary to access a file within a given server from a DATALINK value with a linktype of URL. The value returned includes a write token if the
DATALINK value comes from a DATALINK column with write permission.
126
DLURLSCHEME
Returns the scheme from a DATALINK value with a link type of URL.
DLURLSERVER
Returns the file server from a datalink value with a linktype of URL.
DLVALUE
Converts numeric or valid character input to type double. This function is actually two with
the same name. The one that converts numeric input is a SYSIBM function, while the other
that handles character input is a SYSFUN function. The keyword DOUBLE_PRECISION has
not been defined for the latter.
WITH temp1(c1,d1) AS
(VALUES (12345,12.4)
,(-23.5,1234)
,(1E+45,-234)
,(-2e05,+2.4))
SELECT DOUBLE(c1) AS c1d
,DOUBLE(d1) AS d1d
FROM
temp1;
Returns a encrypted rendition of the input string. The input must be char or varchar. The output is varchar for bit data.
ENCRYPT
encrypted data
, password
, hint
ENCRYPTED DATA: A char or varchar string 32633 bytes that is to be encrypted. Numeric data must be converted to character before encryption.
PASSWORD: A char or varchar string of at least six bytes and no more than 127 bytes. If
the value is null or not provided, the current value of the encryption password special register will be used. Be aware that a password that is padded with blanks is not the same as
one that lacks the blanks.
HINT: A char or varchar string of up to 32 bytes that can be referred to if one forgets
what the password is. It is included with the encrypted string and can be retrieved using
the GETHINT function.
When the hint is provided, the length of the input data, plus eight bytes, plus the distance
to the next eight-byte boundary, plus thirty-two bytes for the hint.
Scalar Functions
127
Graeme Birchall
When the hint is not provided, the length of the input data, plus eight bytes, plus the distance to the next eight-byte boundary.
SELECT
id
,name
,ENCRYPT(name,THAT IDIOT,MY BROTHER) AS name2
FROM
staff
WHERE ID < 30
ORDER BY id;
Returns the exponential function of the argument. The output format is double.
WITH temp1(n1) AS
(VALUES (0)
UNION ALL
SELECT n1 + 1
FROM
temp1
WHERE
n1 < 10)
SELECT n1
,EXP(n1)
AS e1
,SMALLINT(EXP(n1)) AS e2
FROM
temp1;
ANSWER
==============================
N1 E1
E2
-- --------------------- ----0 +1.00000000000000E+0
1
1 +2.71828182845904E+0
2
2 +7.38905609893065E+0
7
3 +2.00855369231876E+1
20
4 +5.45981500331442E+1
54
5 +1.48413159102576E+2
148
6 +4.03428793492735E+2
403
7 +1.09663315842845E+3 1096
8 +2.98095798704172E+3 2980
9 +8.10308392757538E+3 8103
10 +2.20264657948067E+4 22026
Same as DOUBLE.
FLOOR
Returns the next largest integer value that is smaller than or equal to the input (e.g. 5.945 returns 5.000). The output field type will equal the input field type.
SELECT d1
,FLOOR(d1) AS d2
,f1
,FLOOR(f1) AS f2
FROM
scalar;
Uses the system clock and node number to generate a value that is guaranteed unique (as long
as one does not reset the clock). The output is of type char(13) for bit data. There are no arguments. The result is essentially a timestamp (set to GMT, not local time), with the node
number appended to the back.
128
SELECT
id
,GENERATE_UNIQUE()
AS unique_val#1
,DEC(HEX(GENERATE_UNIQUE()),26) AS unique_val#2
FROM
staff
WHERE
id < 50
ORDER BY id;
ANSWER
=================
ID UNIQUE_VAL#1
-- -------------10
20
30
40
===========================
UNIQUE_VAL#2
--------------------------20011017191648990521000000.
20011017191648990615000000.
20011017191648990642000000.
20011017191648990669000000.
One thing that DB2 lacks is a random number generator that makes unique values. However,
if we flip the characters returned in the GENERATE_UNIQUE output, we have something
fairly close to what is needed. Unfortunately, DB2 also lacks a REVERSE function, so the
data flipping has to be done the hard way.
SELECT
u1
,SUBSTR(u1,20,1) CONCAT SUBSTR(u1,19,1) CONCAT
SUBSTR(u1,18,1) CONCAT SUBSTR(u1,17,1) CONCAT
SUBSTR(u1,16,1) CONCAT SUBSTR(u1,15,1) CONCAT
SUBSTR(u1,14,1) CONCAT SUBSTR(u1,13,1) CONCAT
SUBSTR(u1,12,1) CONCAT SUBSTR(u1,11,1) CONCAT
SUBSTR(u1,10,1) CONCAT SUBSTR(u1,09,1) CONCAT
SUBSTR(u1,08,1) CONCAT SUBSTR(u1,07,1) CONCAT
SUBSTR(u1,06,1) CONCAT SUBSTR(u1,05,1) CONCAT
SUBSTR(u1,04,1) CONCAT SUBSTR(u1,03,1) CONCAT
SUBSTR(u1,02,1) CONCAT SUBSTR(u1,01,1) AS U2
FROM
(SELECT HEX(GENERATE_UNIQUE()) AS u1
FROM
staff
WHERE id < 50) AS xxx
ORDER BY u2;
ANSWER
================================================
U1
U2
-------------------------- -------------------20000901131649119940000000 04991194613110900002
20000901131649119793000000 39791194613110900002
20000901131649119907000000 70991194613110900002
20000901131649119969000000 96991194613110900002
Scalar Functions
129
Graeme Birchall
One can refer to a user-defined reverse function (see page 347 for the definition code) to flip
the U1 value, and thus greatly simplify the query:
SELECT
u1
,SUBSTR(reverse(CHAR(u1)),7,20) AS u2
FROM
(SELECT HEX(GENERATE_UNIQUE()) AS u1
FROM
STAFF
WHERE ID < 50) AS xxx
ORDER BY U2;
id
,name
,GETHINT(name2) AS hint
FROM
(SELECT id
,name
,ENCRYPT(name,THAT IDIOT,MY BROTHER) AS name2
FROM
staff
WHERE id < 30
ANSWER
)AS xxx
=====================
ORDER BY id;
ID NAME
HINT
-- ------- ---------10 Sanders MY BROTHER
20 Pernal MY BROTHER
Converts the input (1st argument) to a graphic data type. The output length (2nd argument) is
optional.
HASHEDVALUE
Returns the partition number of the row. The result is zero if the table is not partitioned. The
output is of type integer, and is never null.
SELECT
FROM
WHERE
HASHEDVALUE(id) AS hvalue
staff
id = 10;
ANSWER
======
HVALUE
-----0
Returns the hexadecimal representation of a value. All input types are supported.
130
WITH temp1(n1) AS
(VALUES (-3)
UNION ALL
SELECT n1 + 1
FROM
temp1
WHERE
n1 < 3)
SELECT SMALLINT(n1)
,HEX(SMALLINT(n1))
,HEX(DEC(n1,4,0))
,HEX(DOUBLE(n1))
FROM
temp1;
AS
AS
AS
AS
ANSWER
===============================
S SHX DHX
FHX
-- ---- ------ ----------------3 FDFF 00003D 00000000000008C0
-2 FEFF 00002D 00000000000000C0
-1 FFFF 00001D 000000000000F0BF
0 0000 00000C 0000000000000000
1 0100 00001C 000000000000F03F
2 0200 00002C 0000000000000040
3 0300 00003C 0000000000000840
s
shx
dhx
fhx
ANSWER
=======================================
C1
CHX
V1
VHX
------ ------------ ------ -----------ABCDEF 414243444546 ABCDEF 414243444546
ABCD
414243442020 ABCD
41424344
AB
414220202020 AB
4142
ANSWER
===================================
DT1
DTHX
TM1
TMHX
---------- -------- -------- -----1996-04-22 19960422 23:58:58 235858
1996-08-15 19960815 15:15:15 151515
0001-01-01 00010101 00:00:00 000000
Returns the hour (as in hour of day) part of a time value. The output format is integer.
SELECT
tm1
,HOUR(tm1) AS hr
FROM
scalar
ORDER BY tm1;
ANSWER
============
TM1
HR
-------- -00:00:00
0
15:15:15 15
23:58:58 23
Returns the most recently assigned value (by the current user) to an identity column. The result type is decimal (31,0), regardless of the field type of the identity column. See page 265
for detailed notes on using this function.
CREATE TABLE seq#
(ident_val
INTEGER
NOT NULL GENERATED ALWAYS AS IDENTITY
,cur_ts
TIMESTAMP NOT NULL
,PRIMARY KEY (ident_val));
COMMIT;
INSERT INTO seq# VALUES(DEFAULT,CURRENT TIMESTAMP);
WITH temp (idval) AS
(VALUES (IDENTITY_VAL_LOCAL()))
SELECT *
FROM
temp;
ANSWER
======
IDVAL
----1.
Scalar Functions
131
Graeme Birchall
INSERT
Insert one string in the middle of another, replacing a portion of what was already there. If the
value to be inserted is either longer or shorter than the piece being replaced, the remainder of
the data (on the right) is shifted either left or right accordingly in order to make a good fit.
INSERT (
source
, start-pos
, del-bytes
, new-value
The first and last parameters must always have matching field types.
To insert a new value in the middle of another without removing any of what is already
there, set the third parameter to zero.
The INTEGER or INT function converts either a number or a valid character value into an
integer. The character input can have leading and/or trailing blanks, and a sign indictor, but it
can not contain a decimal point. Numeric decimal input works just fine.
SELECT d1
,INTEGER(d1)
,INT(+123)
,INT(-123)
,INT( 123 )
FROM
scalar;
ANSWER
====================================
D1
2
3
4
5
----- ----- ------ ------ ------2.4
-2
123
-123
123
0.0
0
123
-123
123
1.8
1
123
-123
123
Converts a date (or equivalent) value into a number which represents the number of days
since January the 1st, 4,713 BC. The output format is integer.
WITH temp1(dt1) AS
(VALUES (0001-01-01-00.00.00)
,(1752-09-10-00.00.00)
,(1993-01-03-00.00.00)
,(1993-01-03-23.59.59))
SELECT DATE(dt1)
AS dt
,DAYS(dt1)
AS dy
,JULIAN_DAY(dt1) AS dj
FROM
temp1;
ANSWER
=========================
DT
DY
DJ
---------- ------ ------0001-01-01
1 1721426
1752-09-10 639793 2361218
1993-01-03 727566 2448991
1993-01-03 727566 2448991
132
Scalar Functions
133
Graeme Birchall
For dates in the period from 1859 to about 2130 only five digits need to be used to specify the
date rather than seven.
MJD 0 thus corresponds to JD 2,400,000.5, which is twelve hours after noon on JD 2,400,000
= 1858-11-16. Thus MJD 0 designates the midnight of November 16th/17th, 1858, so day 0
in the system of modified Julian day numbers is the day 1858-11-17.
The following SQL statement uses the JULIAN_DAY function to get the Julian Date for certain days. The same calculation is also done using hand-coded SQL.
SELECT
bd
,JULIAN_DAY(bd)
,(1461 * (YEAR(bd) + 4800 +
(MONTH(bd)-14)/12))/4
+( 367 * (MONTH(bd)- 2
- 12*((MONTH(bd)-14)/12)))/12
-(
3 * ((YEAR(bd) + 4900 +
(MONTH(bd)-14)/12)/100))/4
+DAY(bd) - 32075
FROM
(SELECT birthdate AS bd
FROM
employee
WHERE midinit = R
ANSWER
) AS xxx
==========================
ORDER BY bd;
BD
2
3
---------- ------- ------1926-05-17 2424653 2424653
1936-03-28 2428256 2428256
1946-07-09 2432011 2432011
1955-04-12 2435210 2435210
Many computer users think of the "Julian Date" as a date format that has a layout of "yynnn"
or "yyyynnn" where "yy" is the year and "nnn" is the number of days since the start of the
same. A more correct use of the term "Julian Date" refers to the current date according to the
calendar as originally defined by Julius Caesar - which has a leap year on every fourth year.
In the US/UK, this calendar was in effect until "1752-09-14". The days between the 3rd and
13th of September in 1752 were not used in order to put everything back in sync. In the 20th
and 21st centuries, to derive the Julian date one must subtract 13 days from the relevant Gregorian date (e.g.1994-01-22 becomes 1994-01-07).
The following SQL illustrates how to convert a standard DB2 Gregorian Date to an equivalent Julian Date (calendar) and a Julian Date (output format):
ANSWER
=============================
DT
DJ1
DJ2
---------- ---------- ------1997-01-01 1996-12-17 1997001
1997-01-02 1996-12-18 1997002
1997-12-31 1997-12-16 1997365
WITH temp1(dt1) AS
(VALUES (1997-01-01)
,(1997-01-02)
,(1997-12-31))
SELECT DATE(dt1) AS dt
,DATE(dt1) - 15 DAYS AS dj1
,YEAR(dt1) * 1000 + DAYOFYEAR(dt1) AS dj2
FROM
temp1;
LCASE or LOWER
Converts a mixed or upper-case string to lower case. The output is the same data type and
length as the input.
134
SELECT name
,LCASE(name) AS lname
,UCASE(name) AS uname
FROM
staff
WHERE id < 30;
ANSWER
=========================
NAME
LNAME
UNAME
------- ------- ------Sanders sanders SANDERS
Pernal
pernal
PERNAL
The LEFT function has two arguments: The first is an input string of type char, varchar, clob,
or blob. The second is a positive integer value. The output is the left most characters in the
string. Trailing blanks are not removed.
WITH temp1(c1) AS
(VALUES ( ABC)
,( ABC )
,(ABC ))
SELECT c1
,LEFT(c1,4)
AS c2
,LENGTH(LEFT(c1,4)) AS l2
FROM
temp1;
ANSWER
================
C1
C2
L2
----- ----- -ABC
AB
4
ABC
ABC
4
ABC
ABC
4
Returns an integer value with the internal length of the expression (except for double-byte
string types, which return the length in characters). The value will be the same for all fields in
a column, except for columns containing varying-length strings.
SELECT LENGTH(d1)
,LENGTH(f1)
,LENGTH(s1)
,LENGTH(c1)
,LENGTH(RTRIM(c1))
FROM
scalar;
ANSWER
=======================
1
2
3
4
5
--- --- --- --- --2
8
2
6
6
2
8
2
6
4
2
8
2
6
2
Returns the natural logarithm of the argument (same as LOG). The output format is double.
WITH temp1(n1) AS
(VALUES (1),(123),(1234)
,(12345),(123456))
SELECT n1
,LOG(n1) AS l1
FROM
temp1;
ANSWER
===============================
N1
L1
------ ----------------------1
+0.00000000000000E+000
123
+4.81218435537241E+000
1234
+7.11801620446533E+000
12345
+9.42100640177928E+000
123456
+1.17236400962654E+001
Returns an integer value with the absolute starting position of the first occurrence of the first
string within the second string. If there is no match the result is zero. The optional third parameter indicates where to start the search.
Scalar Functions
135
Graeme Birchall
LOCATE (
find-string
, look-in-string
)
, start-pos.
ANSWER
==========================
C1
2
3
4
5
------ --- --- --- --ABCDEF
4
4
5
0
ABCD
4
4
0
0
AB
0
0
0
0
Returns the base ten logarithm of the argument. The output format is double.
WITH temp1(n1) AS
(VALUES (1),(123),(1234)
,(12345),(123456))
SELECT n1
,LOG10(n1) AS l1
FROM
temp1;
ANSWER
===============================
N1
L1
------ ----------------------1
+0.00000000000000E+000
123
+2.08990511143939E+000
1234
+3.09131515969722E+000
12345
+4.09149109426795E+000
123456
+5.09151220162777E+000
Converts the input (1st argument) to a long_varchar data type. The output length (2nd argument) is optional.
LONG_VARGRAPHIC
Converts the input (1st argument) to a long_vargraphic data type. The output length (2nd argument) is optional.
LOWER
Remove leading blanks, but not trailing blanks, from the argument.
WITH temp1(c1) AS
(VALUES ( ABC)
,( ABC )
,(ABC ))
SELECT c1
,LTRIM(c1)
AS c2
,LENGTH(LTRIM(c1)) AS l2
FROM
temp1;
ANSWER
================
C1
C2
L2
----- ----- -ABC ABC
3
ABC
ABC
4
ABC
ABC
5
136
MICROSECOND
Returns the microsecond part of a timestamp (or equivalent) value. The output is integer.
SELECT
ts1
,MICROSECOND(ts1)
FROM
scalar
ORDER BY ts1;
ANSWER
======================================
TS1
2
-------------------------- ----------0001-01-01-00.00.00.000000
0
1996-04-22-23.58.58.123456
123456
1996-08-15-15.15.15.151515
151515
Returns the number of seconds since midnight from a timestamp, time or equivalent value.
The output format is integer.
SELECT ts1
,MIDNIGHT_SECONDS(ts1)
,HOUR(ts1)*3600 +
MINUTE(ts1)*60 +
SECOND(ts1)
FROM
scalar
ORDER BY ts1;
ANSWER
======================================
TS1
2
3
-------------------------- ----- ----0001-01-01-00.00.00.000000
0
0
1996-04-22-23.58.58.123456 86338 86338
1996-08-15-15.15.15.151515 54915 54915
MINUTE
Returns the minute part of a time or timestamp (or equivalent) value. The output is integer.
SELECT
ts1
,MINUTE(ts1)
FROM
scalar
ORDER BY ts1;
ANSWER
======================================
TS1
2
-------------------------- ----------0001-01-01-00.00.00.000000
0
1996-04-22-23.58.58.123456
58
1996-08-15-15.15.15.151515
15
Scalar Functions
137
Graeme Birchall
MOD
Returns the remainder (modulus) for the first argument divided by the second. In the following example the last column uses the MOD function to get the modulus, while the second to
last column obtains the same result using simple arithmetic.
WITH temp1(n1,n2) AS
(VALUES (-31,+11)
UNION ALL
SELECT n1 + 13
,n2 - 4
FROM
temp1
WHERE
n1 < 60
)
SELECT
n1
,n2
,n1/n2
AS div
,n1-((n1/n2)*n2) AS md1
,MOD(n1,n2)
AS md2
FROM
temp1
ORDER BY 1;
ANSWER
=======================
N1
N2
DIV MD1 MD2
--- --- --- --- ---31
11
-2
-9
-9
-18
7
-2
-4
-4
-5
3
-1
-2
-2
8
-1
-8
0
0
21
-5
-4
1
1
34
-9
-3
7
7
47 -13
-3
8
8
60 -17
-3
9
9
Returns an integer value in the range 1 to 12 that represents the month part of a date or timestamp (or equivalent) value.
MONTHNAME
Returns the name of the month (e.g. October) as contained in a date (or equivalent) value. The
output format is varchar(100).
SELECT
dt1
,MONTH(dt1)
,MONTHNAME(dt1)
FROM
scalar
ORDER BY dt1;
ANSWER
=======================
DT1
2
3
---------- -- ------0001-01-01
1 January
1996-04-22
4 April
1996-08-15
8 August
138
Table Functions
MULTIPLY_ALT
Returns the product of two arguments as a decimal value. Use this function instead of the
multiplication operator when you need to avoid an overflow error because DB2 is putting
aside too much space for the scale (i.e. fractional part of number) Valid input is any exact
numeric type: decimal, integer, bigint, or smallint (but not float).
WITH temp1 (n1,n2) AS
(VALUES (DECIMAL(1234,10)
,DECIMAL(1234,10)))
SELECT n1
,n2
,n1 * n2
AS p1
,"*"(n1,n2)
AS p2
,MULTIPLY_ALT(n1,n2) AS p3
FROM
temp1;
>>
>>
>>
>>
>>
ANSWER
========
1234.
1234.
1522756.
1522756.
1522756.
INPUT#2
==========
DEC(05,00)
DEC(11,03)
DEC(21,13)
DEC(10,01)
DEC(15,08)
RESULT
"*" OPERATOR
============
DEC(10,00)
DEC(21,08)
DEC(31,28)
DEC(31,24)
DEC(31,11)
RESULT
MULTIPLY_ALT
============
DEC(10,00)
DEC(21,08)
DEC(31,18)
DEC(31,19)
DEC(31,03)
<--MULTIPLY_ALT->
SCALE
PRECSION
TRUNCATD TRUNCATD
======== =======
NO
NO
NO
NO
YES
NO
YES
NO
YES
YES
Returns null if the two values being compared are equal, otherwise returns the first value.
Scalar Functions
139
Graeme Birchall
SELECT s1
,NULLIF(s1,0)
,c1
,NULLIF(c1,AB)
FROM
scalar
WHERE NULLIF(0,0) IS NULL;
ANSWER
=====================
S1 2
C1
4
--- --- ------ ------2 -2 ABCDEF ABCDEF
0
- ABCD
ABCD
1
1 AB
-
Returns the partition map index of the row. The result is zero if the table is not partitioned.
The output is of type integer, and is never null.
SELECT
FROM
WHERE
PARTITION(id) AS pp
staff
id = 10;
ANSWER
======
PP
-0
POSSTR
Returns the position at which the second string is contained in the first string. If there is no
match the value is zero. The test is case sensitive. The output format is integer.
SELECT
c1
,POSSTR(c1, ) AS p1
,POSSTR(c1,CD) AS p2
,POSSTR(c1,cd) AS p3
FROM
scalar
ORDER BY 1;
ANSWER
==================
C1
P1 P2 P3
------ -- -- -AB
3
0
0
ABCD
5
3
0
ABCDEF
0
3
0
The LOCATE and POSSTR functions are very similar. Both look for matching strings
searching from the left. The only functional differences are that the input parameters are reversed and the LOCATE function enables one to begin the search at somewhere other than
the start. When either is suitable for the task at hand, it is probably better to use the POSSTR
function because it is a SYSIBM function and so should be faster.
SELECT c1
,POSSTR(c1, )
,LOCATE( ,c1)
,POSSTR(c1,CD)
,LOCATE(CD,c1)
,POSSTR(c1,cd)
,LOCATE(cd,c1)
,LOCATE(D,c1,2)
FROM
scalar
ORDER BY 1;
AS
AS
AS
AS
AS
AS
AS
p1
l1
p2
l2
p3
l3
l4
ANSWER
===========================
C1
P1 L1 P2 L2 P3 L3 L4
------ -- -- -- -- -- -- -AB
3 3 0 0 0 0 0
ABCD
5 5 3 3 0 0 4
ABCDEF 0 0 3 3 0 0 4
Returns the value of the first argument to the power of the second argument
140
WITH temp1(n1) AS
(VALUES (1),(10),(100))
SELECT n1
,POWER(n1,1) AS p1
,POWER(n1,2) AS p2
,POWER(n1,3) AS p3
FROM
temp1;
ANSWER
===============================
N1
P1
P2
P3
------- ------- ------- ------1
1
1
1
10
10
100
1000
100
100
10000 1000000
Returns an integer value in the range 1 to 4 that represents the quarter of the year from a date
or timestamp (or equivalent) value.
RADIANS
Returns the number of radians converted from the input, which is expressed in degrees. The
output format is double.
RAISE_ERROR
Causes the SQL statement to stop and return a user-defined error message when invoked.
There are a lot of usage restrictions involving this function, see the SQL Reference for details.
RAISE_ERROR
sqlstate
,error-message
ANSWER
==============
S1
S2
------ ------2
-2
0
0
SQLSTATE=80001
Returns a pseudo-random floating-point value in the range of zero to one inclusive. An optional seed value can be provided to get reproducible random results. This function is especially useful when one is trying to create somewhat realistic sample data.
Usage Notes
The RAND function returns any one of 32K distinct floating-point values in the range of
zero to one inclusive. Note that many equivalent functions in other languages (e.g. SAS)
return many more distinct values over the same range.
The values generated by the RAND function are evenly distributed over the range of zero
to one inclusive.
A seed can be provided to get reproducible results. The seed can be any valid number of
type integer. Note that the use of a seed alone does not give consistent results. Two dif-
Scalar Functions
141
Graeme Birchall
ferent SQL statements using the same seed may return different (but internally consistent)
sets of pseudo-random numbers.
If the seed value is zero, the initial result will also be zero. All other seed values return
initial values that are not the same as the seed. Subsequent calls of the RAND function in
the same statement are not affected.
If there are multiple references to the RAND function in the same SQL statement, the
seed of the first RAND invocation is the one used for all.
If the seed value is not provided, the pseudo-random numbers generated will usually be
unpredictable. However, if some prior SQL statement in the same thread has already invoked the RAND function, the newly generated pseudo-random numbers "may" continue
where the prior ones left off.
The following recursive SQL generates 100,000 random numbers using two as the seed value.
The generated data is then summarized using various DB2 column functions:
WITH temp (num, ran) AS
(VALUES (INT(1)
,RAND(2))
UNION ALL
SELECT num + 1
,RAND()
FROM
temp
WHERE
num < 100000
)
SELECT COUNT(*)
,COUNT(DISTINCT ran)
,DEC(AVG(ran),7,6)
,DEC(STDDEV(ran),7,6)
,DEC(MIN(ran),7,6)
,DEC(MAX(ran),7,6)
,DEC(MAX(ran),7,6) DEC(MIN(ran),7,6)
,DEC(VAR(ran),7,6)
FROM
temp;
AS
AS
AS
AS
AS
AS
#rows
#values
avg_ran
std_dev
min_ran
max_ran
AS range
AS variance
==>
==>
==>
ANSWER
=============
100000
31242
0.499838
0.288706
0.000000
1.000000
1.000000
0.083351
The RAND function creates pseudo-random numbers. This means that the output looks random, but it is actually made using a very specific formula. If the first invocation of the function uses a seed value, all subsequent invocations will return a result that is explicitly derived
from the initial seed. To illustrate this concept, the following statement selects six random
numbers. Because of the use of the seed, the same six values will always be returned when
this SQL statement is invoked (when invoked on my machine):
142
SELECT
deptno AS dno
,RAND(0) AS ran
FROM
department
WHERE
deptno < E
ORDER BY 1;
ANSWER
===========================
DNO RAN
--- ---------------------A00 +1.15970336008789E-003
B01 +2.35572374645222E-001
C01 +6.48152104251228E-001
D01 +7.43736075930052E-002
D11 +2.70241401409955E-001
D21 +3.60026856288339E-001
deptno AS dno
,RAND() AS ran
FROM
department
WHERE
deptno < D
ORDER BY 1;
ANSWER
===========================
DNO RAN
--- ---------------------A00 +2.55287331766717E-001
B01 +9.85290078432569E-001
C01 +3.18918424024171E-001
Imagine that we need to generate a set of reproducible random numbers that are within a certain range (e.g. 5 to 15). Recursive SQL can be used to make the rows, and various scalar
functions can be used to get the right range of data.
In the following example we shall make a list of three columns and ten rows. The first field is
a simple ascending sequence. The second is a set of random numbers of type smallint in the
range zero to 350 (by increments of ten). The last is a set of random decimal numbers in the
range of zero to 10,000.
WITH Temp1 (col1, col2, col3) AS
(VALUES (0
,SMALLINT(RAND(2)*35)*10
,DECIMAL(RAND()*10000,7,2))
UNION ALL
SELECT col1 + 1
,SMALLINT(RAND()*35)*10
,DECIMAL(RAND()*10000,7,2)
FROM
temp1
WHERE col1 + 1 < 10
)
SELECT *
FROM
temp1;
ANSWER
===================
COL1 COL2 COL3
---- ---- ------0
0 9342.32
1
250 8916.28
2
310 5430.76
3
150 5996.88
4
110 8066.34
5
50 5589.77
6
130 8602.86
7
340
184.94
8
310 5441.14
9
70 9267.55
The RAND function generates 32K distinct random values. To get a larger set of (evenly distributed) random values, combine the result of two RAND calls in the manner shown below
for the RAN2 column:
Scalar Functions
143
Graeme Birchall
ANSWER
===================
COL#1 RAN#1 RAN#2
----- ----- ----30000 19698 29998
WARNING: Using the RAND function in a predicate can result in unpredictable results.
See page 360 for a detailed description of this issue.
Imagine that you want to select approximately 10% of the matching rows from some table.
The predicate in the following query will do the job:
SELECT
id
,name
FROM
staff
WHERE
RAND() < 0.1
ORDER BY id;
ANSWER
============
ID NAME
--- -------140 Fraye
190 Sneider
290 Quill
The following query will select five random rows from the set of matching rows. It begins (in
the nested table expression) by using the ROW_NUMBER function to assign row numbers to
the matching rows in random order (using the RAND function). Subsequently, those rows
with the five lowest row numbers are selected:
144
SELECT
id
,name
FROM
(SELECT s.*
,ROW_NUMBER() OVER(ORDER BY RAND()) AS r
FROM
staff s
)AS xxx
WHERE
r <= 5
ORDER BY id;
ANSWER
============
ID NAME
--- -------10 Sanders
30 Marenghi
190 Sneider
270 Lea
280 Wilson
Imagine that in act of inspired unfairness, we decided to update a selected set of employees
salary to a random number in the range of zero to $10,000. This too is easy:
UPDATE
SET
WHERE
staff
salary = RAND()*10000
id < 50;
Returns a string formatted with XML tags. See page 172 for a description of this function.
REPEAT
string-to-repeat
, #times
id
,CHAR(REPEAT(name,3),40)
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
===========================
ID 2
-- -----------------------10 SandersSandersSanders
20 PernalPernalPernal
30 MarenghiMarenghiMarenghi
Replaces all occurrences of one string with another. The output is of type varchar(4000).
REPLACE
string-to-change
, search-for
, replace-with
Scalar Functions
145
Graeme Birchall
SELECT c1
,REPLACE(c1,AB,XY) AS r1
,REPLACE(c1,BA,XY) AS r2
FROM
scalar;
ANSWER
======================
C1
R1
R2
------ ------ -----ABCDEF XYCDEF ABCDEF
ABCD
XYCD
ABCD
AB
XY
AB
ANSWER
==============
C1
R1
------ -----ABCDEF XYCDEF
ABCD
XYCD
AB
XY
Has two arguments: The first is an input string of type char, varchar, clob, or blob. The second is a positive integer value. The output, of type varchar(4000), is the right most characters
in the string.
WITH temp1(c1) AS
(VALUES ( ABC)
,( ABC )
,(ABC ))
SELECT c1
,RIGHT(c1,4)
AS c2
,LENGTH(RIGHT(c1,4)) as l2
FROM
temp1;
ANSWER
================
C1
C2
L2
----- ----- -ABC
ABC
4
ABC
ABC
4
ABC
BC
4
Rounds the rightmost digits of number (1st argument). If the second argument is positive, it
rounds to the right of the decimal place. If the second argument is negative, it rounds to the
left. A second argument of zero results rounds to integer. The input and output types are the
same, except for decimal where the precision will be increased by one - if possible. Therefore,
a DEC(5,2)field will be returned as DEC(6,2), and a DEC(31,2) field as DEC(31,2). To truncate instead of round, use the TRUNCATE function.
146
ANSWER
===============================================
D1
P2
P1
P0
N1
N2
------- ------- ------- ------- ------- ------123.400 123.400 123.400 123.000 120.000 100.000
23.450 23.450 23.400 23.000 20.000
0.000
3.456
3.460
3.500
3.000
0.000
0.000
0.056
0.060
0.100
0.000
0.000
0.000
WITH temp1(d1) AS
(VALUES (123.400)
,( 23.450)
,( 3.456)
,(
.056))
SELECT d1
,DEC(ROUND(d1,+2),6,3)
,DEC(ROUND(d1,+1),6,3)
,DEC(ROUND(d1,+0),6,3)
,DEC(ROUND(d1,-1),6,3)
,DEC(ROUND(d1,-2),6,3)
FROM
temp1;
AS
AS
AS
AS
AS
p2
p1
p0
n1
n2
ANSWER
======================
C1
R1
R2 R3
------ ------ -- -ABCDEF ABCDEF
6
6
ABCD
ABCD
6
4
AB
AB
6
2
Returns the second (of minute) part of a time or timestamp (or equivalent) value.
SIGN
Returns -1 if the input number is less than zero, 0 if it equals zero, and +1 if it is greater than
zero. The input and output types will equal, except for decimal which returns double.
SELECT d1
,SIGN(d1)
,f1
,SIGN(f1)
FROM
scalar;
Returns the SIN of the argument where the argument is an angle expressed in radians. The
output format is double.
Scalar Functions
147
Graeme Birchall
WITH temp1(n1) AS
(VALUES (0)
UNION ALL
SELECT n1 + 10
FROM
temp1
WHERE
n1 < 80)
SELECT n1
,DEC(RADIANS(n1),4,3)
AS ran
,DEC(SIN(RADIANS(n1)),4,3) AS sin
,DEC(TAN(RADIANS(n1)),4,3) AS tan
FROM
temp1;
ANSWER
=======================
N1 RAN
SIN
TAN
-- ----- ----- ----0 0.000 0.000 0.000
10 0.174 0.173 0.176
20 0.349 0.342 0.363
30 0.523 0.500 0.577
40 0.698 0.642 0.839
50 0.872 0.766 1.191
60 1.047 0.866 1.732
70 1.221 0.939 2.747
80 1.396 0.984 5.671
Returns the hyperbolic sin for the argument, where the argument is an angle expressed in radians. The output format is double.
SMALLINT
ANSWER
==================================
D1
2
3
4
5
----- ------ ------ ------ ------2.4
-2
123
-123
123
0.0
0
123
-123
123
1.8
1
123
-123
123
The various SNAPSHOT functions can be used to analyze the system. They are beyond the
scope of this book. Refer instead to the DB2 System Monitor Guide and Reference.
SOUNDEX
Returns a 4-character code representing the sound of the words in the argument. Use the
DIFFERENCE function to convert words to soundex values and then compare.
SELECT
FROM
WHERE
AND
AND
ORDER
a.name
,SOUNDEX(a.name)
,b.name
,SOUNDEX(b.name)
,DIFFERENCE
(a.name,b.name)
staff a
,staff b
a.id = 10
b.id > 150
b.id < 250
BY df DESC
,n2 ASC;
AS
AS
AS
AS
n1
s1
n2
s2
AS df
ANSWER
==============================
N1
S1
N2
S2
DF
------- ---- --------- ---- -Sanders S536 Sneider
S536 4
Sanders S536 Smith
S530 3
Sanders S536 Lundquist L532 2
Sanders S536 Daniels
D542 1
Sanders S536 Molinare M456 1
Sanders S536 Scoutten S350 1
Sanders S536 Abrahams A165 0
Sanders S536 Kermisch K652 0
Sanders S536 Lu
L000 0
There are several minor variations on the SOUNDEX algorithm. Below is one example:
148
The vowels, A, E, I, O, U, and Y are not coded, but are used as separators (see last item).
1
2
3
4
5
6
Letters that follow letters with same code are ignored unless a separator (see the third
item above) precedes them.
The result of the above calculation is a four byte value. The first byte is a character as defined
in step one. The remaining three bytes are digits as defined in steps two through four. Output
longer than four bytes is truncated If the output is not long enough, it is padded on the right
with zeros. The maximum number of distinct values is 8,918.
NOTE: The SOUNDEX function is something of an industry standard that was developed
several decades ago. Since that time, several other similar functions have been developed. You may want to investigate writing your own DB2 function to search for similarsounding names.
SPACE
ANSWER
==================
N1 S1
S2 S3
-- ---- -- ---1
1
X
2
2
X
3
3
X
DB2 maintains a dynamic SQL statement cache. It also has several fields that record usage of
the SQL statements in the cache. The following command can be used to access this data:
Scalar Functions
149
Graeme Birchall
*
TABLE(SQLCACHE_SNAPSHOT()) SS
SS.NUM_EXECUTIONS <> 0;
ORDINAL
AS COLNO
,CHAR(PARMNAME,18)
AS COLNAME
,TYPENAME
AS COLTYPE
,LENGTH
,SCALE
FROM
SYSCAT.FUNCPARMS
WHERE
FUNCSCHEMA = SYSFUN
AND
FUNCNAME
= SQLCACHE_SNAPSHOT
ORDER BY COLNO;
Returns the square root of the input value, which can be any positive number. The output
format is double.
WITH temp1(n1) AS
(VALUES (0.5),(0.0)
,(1.0),(2.0))
SELECT DEC(n1,4,3)
AS n1
,DEC(SQRT(n1),4,3) AS s1
FROM
temp1;
ANSWER
============
N1
S1
----- ----0.500 0.707
0.000 0.000
1.000 1.000
2.000 1.414
150
SUBSTR
Returns part of a string. If the length is not provided, the output is from the start value to the
end of the string.
SUBSTR (
string
, start
)
, length
ANSWER
=========================
LEN DAT1
LDAT SUBDAT
--- --------- ---- -----6 123456789
9 123456
4 12345
5 1234
<error>
Figure 413, SUBSTR function - error because length parm too long
The best way to avoid the above problem is to simply write good code. If that sounds too
much like hard work, try the following SQL:
WITH temp1 (len, dat1) AS
ANSWER
(VALUES
( 6,123456789)
=========================
,( 4,12345
)
LEN DAT1
LDAT SUBDAT
,( 16,123
)
--- --------- ---- -----)
6 123456789
9 123456
SELECT
len
4 12345
5 1234
,dat1
16 123
3 123
,LENGTH(dat1) AS ldat
,SUBSTR(dat1,1,CASE
WHEN len < LENGTH(dat1) THEN len
ELSE LENGTH(dat1)
END ) AS subdat
FROM
temp1;
Figure 414, SUBSTR function - avoid error using CASE (see previous)
In the above SQL a CASE statement is used to compare the LEN value against the length of
the DAT1 field. If the former is larger, it is replaced by the length of the latter.
If the input is varchar, and no length value is provided, the output is varchar. However, if the
length is provided, the output is of type char - with padded blanks (if needed):
SELECT name
,LENGTH(name)
,SUBSTR(name,5)
,LENGTH(SUBSTR(name,5))
,SUBSTR(name,5,3)
,LENGTH(SUBSTR(name,5,3))
FROM
staff
WHERE id < 60;
AS
AS
AS
AS
AS
len
s1
l1
s2
l2
ANSWER
===========================
NAME
LEN S1
L1 S2 L2
-------- --- ---- -- --- -Sanders
7 ers
3 ers 3
Pernal
6 al
2 al
3
Marenghi
8 nghi 4 ngh 3
OBrien
7 ien
3 ien 3
Hanes
5 s
1 s
3
Figure 415, SUBSTR function - fixed length output if third parm. used
Scalar Functions
151
Graeme Birchall
TABLE
There isnt really a TABLE function, but there is a TABLE phrase that returns a result, one
row at a time, from either an external (e.g. user written) function, or from a nested table expression. The TABLE phrase (function) has to be used in the latter case whenever there is a
reference in the nested table expression to a row that exists outside of the expression. An example follows:
SELECT
a.id
,a.dept
,a.salary
,b.deptsal
FROM
staff a
,TABLE
(SELECT
b.dept
,SUM(b.salary) AS deptsal
FROM
staff b
WHERE
b.dept = a.dept
GROUP BY b.dept
)AS b
WHERE
a.id
< 40
ORDER BY a.id;
ANSWER
=========================
ID DEPT SALARY
DEPTSAL
-- ---- -------- -------10 20
18357.50 64286.10
20 20
18171.25 64286.10
30 38
17506.75 77285.55
Returns the base view or table name for a particular alias after all alias chains have been resolved. The output type is varchar(18). If the alias name is not found, the result is the input
values. There are two input parameters. The first, which is required, is the alias name. The
second, which is optional, is the alias schema. If the second parameter is not provided, the
default schema is used for the qualifier.
CREATE ALIAS emp1 FOR employee;
CREATE ALIAS emp2 FOR emp1;
ANSWER
=======================
TABSCHEMA TABNAME CARD
--------- -------- ---graeme
employee
-1
SELECT tabschema
,tabname
,card
FROM
syscat.tables
WHERE tabname
= TABLE_NAME(emp2,graeme);
Returns the base view or table schema for a particular alias after all alias chains have been
resolved. The output type is char(8). If the alias name is not found, the result is the input values. There are two input parameters. The first, which is required, is the alias name. The second, which is optional, is the alias schema. If the second parameter is not provided, the default schema is used for the qualifier.
Resolving non-existent Objects
Dependent aliases are not dropped when a base table or view is removed. After the base table
or view drop, the TABLE_SCHEMA and TABLE_NAME functions continue to work fine
(see the 1st output line below). However, when the alias being checked does not exist, the
original input values (explicit or implied) are returned (see the 2nd output line below).
152
ANSWER
===========================
TAB_SCH TAB_NME
-------- -----------------graeme
fred1
graeme
xxxxx
Returns the tangent of the argument where the argument is an angle expressed in radians.
TANH
Returns the hyperbolic tan for the argument, where the argument is an angle expressed in radians. The output format is double.
TIME
A timestamp value.
Takes an input string with the format: "YYYY-MM-DD HH:MM:SS" and converts it into a
valid timestamp value. The VARCHAR_FORMAT function does the inverse.
Scalar Functions
153
Graeme Birchall
Returns a timestamp in the ISO format (yyyy-mm-dd hh:mm:ss.nnnnnn) converted from the
IBM internal format (yyyy-mm-dd-hh.mm.ss.nnnnnn). If the input is a date, zeros are inserted
in the time part. If the input is a time, the current date is inserted in the date part and zeros in
the microsecond section.
SELECT tm1
,TIMESTAMP_ISO(tm1)
FROM
scalar;
ANSWER
===================================
TM1
2
-------- -------------------------23:58:58 2000-09-01-23.58.58.000000
15:15:15 2000-09-01-15.15.15.000000
00:00:00 2000-09-01-00.00.00.000000
Returns an integer value that is an estimate of the difference between two timestamp values.
Unfortunately, the estimate can sometimes be seriously out (see the example below), so this
function should be used with extreme care.
Arguments
There are two arguments. The first argument indicates what interval kind is to be returned.
Valid options are:
1 = Microseconds.
2 = Seconds.
4 = Minutes.
8 = Hours.
16 = Days.
32 = Weeks.
64 = Months.
128 = Quarters.
256 = Years.
The second argument is the result of one timestamp subtracted from another and then converted to character.
154
WITH
temp1 (ts1,ts2) AS
(VALUES (1996-03-01-00.00.01,1995-03-01-00.00.00)
,(1996-03-01-00.00.00,1995-03-01-00.00.01)),
temp2 (ts1,ts2) AS
(SELECT TIMESTAMP(ts1)
,TIMESTAMP(ts2)
FROM
temp1),
temp3 (ts1,ts2,df) AS
(SELECT ts1
,ts2
,CHAR(TS1 - TS2) AS df
ANSWER
FROM
temp2)
=============================
SELECT df
DF
DIF DYS
,TIMESTAMPDIFF(16,df) AS dif
--------------------- --- --,DAYS(ts1) - DAYS(ts2) AS dys
00010000000001.000000 365 366
FROM
temp3;
00001130235959.000000 360 366
The following user-defined function will get the difference, in microseconds, between two
timestamp values. It can be used as an alternative to the above:
CREATE FUNCTION ts_diff_works(in_hi TIMESTAMP,in_lo TIMESTAMP)
RETURNS BIGINT
RETURN (BIGINT(DAYS(in_hi))
* 86400000000
+ BIGINT(MIDNIGHT_SECONDS(in_hi)) *
1000000
+ BIGINT(MICROSECOND(in_hi)))
-(BIGINT(DAYS(in_lo))
* 86400000000
+ BIGINT(MIDNIGHT_SECONDS(in_lo)) *
1000000
+ BIGINT(MICROSECOND(in_lo)));
This function is a synonym for VARCHAR_FORMAT (see page 158). It converts a timestamp value into a string using a template to define the output layout.
TO_DATE
This function is a synonym for TIMESTAMP_FORMAT (see page 153). It converts a character string value into a timestamp using a template to define the input layout.
TRANSLATE
Converts individual characters in either a character or graphic input string from one value to
another. It can also convert lower case data to upper case.
TRANSLATE (
string
)
, to , from
, substitute
The use of the input string alone generates upper case output.
Scalar Functions
155
Graeme Birchall
When "from" and "to" values are provided, each individual "from" character in the input
string is replaced by the corresponding "to" character (if there is one).
If there is no "to" character for a particular "from" character, those characters in the input
string that match the "from" are set to blank (if there is no substitute value).
A fourth, optional, single-character parameter can be provided that is the substitute character to be used for those "from" values having no "to" value.
If there are more "to" characters than "from" characters, the additional "to" characters are
ignored.
SELECT abcd
,TRANSLATE(abcd)
,TRANSLATE(abcd,,a)
,TRANSLATE(abcd,A,A)
,TRANSLATE(abcd,A,a)
,TRANSLATE(abcd,A,ab)
,TRANSLATE(abcd,A,ab, )
,TRANSLATE(abcd,A,ab,z)
,TRANSLATE(abcd,AB,a)
FROM
staff
WHERE id = 10;
==>
==>
==>
ANS.
====
abcd
ABCD
bcd
abcd
Abcd
A cd
A cd
Azcd
Abcd
NOTES
=================
No change
Make upper case
a=>
A=>A
a=>A
a=>A,b=>
a=>A,b=>
a=>A,b=>z
a=>A
Both the REPLACE and the TRANSLATE functions alter the contents of input strings. They
differ in that the REPLACE converts whole strings while the TRANSLATE converts multiple
sets of individual characters. Also, the "to" and "from" strings are back to front.
SELECT c1
,REPLACE(c1,AB,XY)
,REPLACE(c1,BA,XY)
,TRANSLATE(c1,XY,AB)
,TRANSLATE(c1,XY,BA)
FROM
scalar
WHERE c1 = ABCD;
==>
==>
==>
ANSWER
======
ABCD
XYCD
ABCD
XYCD
YXCD
Truncates (not rounds) the rightmost digits of an input number (1st argument). If the second
argument is positive, it truncates to the right of the decimal place. If the second value is negative, it truncates to the left. A second value of zero truncates to integer. The input and output
types will equal. To round instead of truncate, use the ROUND function.
156
ANSWER
===============================================
D1
POS2
POS1
ZERO
NEG1
NEG2
------- ------- ------- ------- ------- ------123.400 123.400 123.400 123.000 120.000 100.000
23.450 23.440 23.400 23.000 20.000
0.000
3.456
3.450
3.400
3.000
0.000
0.000
0.056
0.050
0.000
0.000
0.000
0.000
WITH temp1(d1) AS
(VALUES (123.400)
,( 23.450)
,( 3.456)
,(
.056))
SELECT d1
,DEC(TRUNC(d1,+2),6,3)
,DEC(TRUNC(d1,+1),6,3)
,DEC(TRUNC(d1,+0),6,3)
,DEC(TRUNC(d1,-1),6,3)
,DEC(TRUNC(d1,-2),6,3)
FROM
temp1
ORDER BY 1 DESC;
AS
AS
AS
AS
AS
pos2
pos1
zero
neg1
neg2
Returns the internal type identifier of he dynamic data type of the expression.
TYPE_NAME
Returns the unqualified name of the dynamic data type of the expression.
TYPE_SECHEMA
Returns the schema name of the dynamic data type of the expression.
UCASE or UPPER
Converts a mixed or lower-case string to upper case. The output is the same data type and
length as the input.
SELECT name
,LCASE(name) AS lname
,UCASE(name) AS uname
FROM
staff
WHERE id < 30;
ANSWER
=========================
NAME
LNAME
UNAME
------- ------- ------Sanders sanders SANDERS
Pernal
pernal
PERNAL
Same as COALESCE.
VARCHAR
Converts the input (1st argument) to a varchar data type. The output length (2nd argument) is
optional. Trailing blanks are not removed.
SELECT c1
,LENGTH(c1)
,VARCHAR(c1)
,LENGTH(VARCHAR(c1))
,VARCHAR(c1,4)
FROM
scalar;
AS
AS
AS
AS
l1
v2
l2
v3
ANSWER
========================
C1
L1 V2
L2 V3
------ -- ------ -- ---ABCDEF 6 ABCDEF 6 ABCD
ABCD
6 ABCD
6 ABCD
AB
6 AB
6 AB
Scalar Functions
157
Graeme Birchall
VARCHAR_FORMAT
Converts a timestamp value into a string with the format: "YYYY-MM-DD HH:MM:SS".
The TIMESTAMP_FORMAT function does the inverse.
WITH temp1 (ts1) AS
(VALUES (TIMESTAMP(1999-12-31-23.59.59))
,(TIMESTAMP(2002-10-30-11.22.33))
)
SELECT
ts1
,VARCHAR_FORMAT(ts1,YYYY-MM-DD HH24:MI:SS) AS ts2
FROM
temp1
ORDER BY ts1;
ANSWER
==============================================
TS1
TS2
-------------------------- ------------------1999-12-31-23.59.59.000000 1999-12-31 23:59:59
2002-10-30-11.22.33.000000 2002-10-30 11:22:33
Converts the input (1st argument) to a vargraphic data type. The output length (2nd argument)
is optional.
VEBLOB_CP_LARGE
Returns a value in the range 1 to 53 or 54 that represents the week of the year, where a week
begins on a Sunday, or on the first day of the year. Valid input types are a date, a timestamp,
or an equivalent character value. The output is of type integer.
SELECT
FROM
WEEK(DATE(2000-01-01))
,WEEK(DATE(2000-01-02))
,WEEK(DATE(2001-01-02))
,WEEK(DATE(2000-12-31))
,WEEK(DATE(2040-12-31))
sysibm.sysdummy1;
AS
AS
AS
AS
AS
w1
w2
w3
w4
w5
ANSWER
==================
W1 W2 W3 W4 W5
-- -- -- -- -1
2
1 54 53
Returns an integer value, in the range 1 to 53, that is the "ISO" week number. An ISO week
differs from an ordinary week in that it begins on a Monday and it neither ends nor begins at
the exact end of the year. Instead, week 1 is the first week of the year to contain a Thursday.
Therefore, it is possible for up to three days at the beginning of the year to appear in the last
week of the previous year. As with ordinary weeks, not all ISO weeks contain seven days.
158
WITH
temp1 (n) AS
(VALUES (0)
UNION ALL
SELECT n+1
FROM
temp1
WHERE n < 10),
temp2 (dt2) AS
(SELECT DATE(1998-12-27) + y.n YEARS
+ d.n DAYS
FROM
temp1 y
,temp1 d
WHERE y.n IN (0,2))
SELECT
CHAR(dt2,ISO)
dte
,SUBSTR(DAYNAME(dt2),1,3)
dy
,WEEK(dt2)
wk
,DAYOFWEEK(dt2)
dy
,WEEK_ISO(dt2)
wi
,DAYOFWEEK_ISO(dt2)
di
FROM
temp2
ORDER BY 1;
ANSWER
==========================
DTE
DY WK DY WI DI
---------- --- -- -- -- -1998-12-27 Sun 53 1 52 7
1998-12-28 Mon 53 2 53 1
1998-12-29 Tue 53 3 53 2
1998-12-30 Wed 53 4 53 3
1998-12-31 Thu 53 5 53 4
1999-01-01 Fri 1 6 53 5
1999-01-02 Sat 1 7 53 6
1999-01-03 Sun 2 1 53 7
1999-01-04 Mon 2 2 1 1
1999-01-05 Tue 2 3 1 2
1999-01-06 Wed 2 4 1 3
2000-12-27 Wed 53 4 52 3
2000-12-28 Thu 53 5 52 4
2000-12-29 Fri 53 6 52 5
2000-12-30 Sat 53 7 52 6
2000-12-31 Sun 54 1 52 7
2001-01-01 Mon 1 2 1 1
2001-01-02 Tue 1 3 1 2
2001-01-03 Wed 1 4 1 3
2001-01-04 Thu 1 5 1 4
2001-01-05 Fri 1 6 1 5
2001-01-06 Sat 1 7 1 6
Returns a four-digit year value in the range 0001 to 9999 that represents the year (including
the century). The input is a date or timestamp (or equivalent) value. The output is integer.
SELECT dt1
,YEAR(dt1) AS yr
,WEEK(dt1) AS wk
FROM
scalar;
ANSWER
======================
DT1
YR
WK
---------- ---- ---1996-04-22 1996
17
1996-08-15 1996
33
0001-01-01
1
1
The PLUS function is same old plus sign that you have been using since you were a kid. One
can use it the old fashioned way, or as if it were normal a DB2 function - with one or two input items. If there is a single input item, then the function acts as the unary "plus" operator. If
there are two items, the function adds them:
SELECT
id
,salary
,"+"(salary)
AS s2
,"+"(salary,id) AS s3
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
=============================
ID SALARY
S2
S3
-- -------- -------- -------10 18357.50 18357.50 18367.50
20 18171.25 18171.25 18191.25
30 17506.75 17506.75 17536.75
Scalar Functions
159
Graeme Birchall
empno
,CHAR(birthdate,ISO)
AS bdate1
,CHAR(birthdate + 1 YEAR,ISO)
AS bdate2
,CHAR("+"(birthdate,DEC(00010000,8)),ISO)
AS bdate3
,CHAR("+"(birthdate,DOUBLE(1),SMALLINT(1)),ISO) AS bdate4
FROM
employee
WHERE
empno < 000040
ORDER BY empno;
ANSWER
==================================================
EMPNO BDATE1
BDATE2
BDATE3
BDATE4
------ ---------- ---------- ---------- ---------000010 1933-08-24 1934-08-24 1934-08-24 1934-08-24
000020 1948-02-02 1949-02-02 1949-02-02 1949-02-02
000030 1941-05-11 1942-05-11 1942-05-11 1942-05-11
The MINUS works the same way as the PLUS function, but does the opposite:
SELECT
id
,salary
,"-"(salary)
AS s2
,"-"(salary,id) AS s3
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
==============================
ID SALARY
S2
S3
-- -------- --------- -------10 18357.50 -18357.50 18347.50
20 18171.25 -18171.25 18151.25
30 17506.75 -17506.75 17476.75
id
,salary
,salary * id
AS s2
,"*"(salary,id) AS s3
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
===============================
ID SALARY
S2
S3
-- -------- --------- --------10 18357.50 183575.00 183575.00
20 18171.25 363425.00 363425.00
30 17506.75 525202.50 525202.50
id
,salary
,salary / id
AS s2
,"/"(salary,id) AS s3
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
=============================
ID SALARY
S2
S3
-- -------- -------- -------10 18357.50 1835.750 1835.750
20 18171.25 908.562 908.562
30 17506.75 583.558 583.558
160
SELECT
id
,name || Z
,name CONCAT Z
,"||"(name,Z)
,CONCAT(name,Z)
FROM
staff
WHERE
LENGTH(name) < 5
ORDER BY id;
AS
AS
As
As
n1
n2
n3
n4
ANSWER
===========================
ID N1
N2
N3
N4
--- ----- ----- ----- ----110 NganZ NganZ NganZ NganZ
210 LuZ
LuZ
LuZ
LuZ
270 LeaZ LeaZ LeaZ LeaZ
Scalar Functions
161
Graeme Birchall
162
XML Functions
The DB2 XML functions can be used to convert standard SQL (tabular) output into XML
structured data. Below is a very brief introduction to their use.
NOTE: The XML functions discussed in this chapter generate XML output. If one has the
DB2 XML extenders, one can also query XML data.
Introduction to XML
If you use XML (Extensible Markup Language), you probably know more about it than I do,
so what follows is a very brief introduction to the language. In essence, when one distributes
XML content one provides both data, and a description of the data. To illustrate the benefits
of doing this, consider the following query:
SELECT
dept
,name
,comm
FROM
staff
WHERE
dept < 30
AND
id
< 100
ORDER BY dept
,name;
ANSWER
====================
DEPT NAME
COMM
---- ------- ------15 Hanes
15 Rothman 1152.00
20 James
128.20
20 Pernal
612.45
20 Sanders
-
Name="Hanes"><Dept>15</Dept><Comm></Comm></Emp>
Name="Rothman"><Dept>15</Dept><Comm>01152.00</Comm></Emp>
Name="James"><Dept>20</Dept><Comm>00128.20</Comm></Emp>
Name="Pernal"><Dept>20</Dept><Comm>00612.45</Comm></Emp>
Name="Sanders"><Dept>20</Dept><Comm></Comm></Emp>
Sub-elements must follow a consistent logical structure (e.g. salary within employee).
Attributes of elements must also make logical sense (e.g. name of employee).
XML Functions
163
Graeme Birchall
XML Functions
XMLSERIALIZE
Converts XML input to CHAR, VARCHAR, or CLOB. If the input is null, the output is null.
XMLSERIALIZE
CONTENT
AS
xmlagg function
xmlelement-fucntion
xmlforest-fucntion
xmlconcat-fucntion
CHARACTER
CHAR
( integer )
( integer )
VARCHAR
CHARACTER
VARYING
CHAR
CLOB
( integer
CHARACTER
)
K
LARGE OBJECT
CHAR
id
,XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Dept", dept)
AS CHAR(30)) AS xmldata
FROM
staff
WHERE
id BETWEEN 20 AND 30
ORDER BY id;
ANSWER
==================
ID XMLDATA
-- --------------20 <Dept>20</Dept>
30 <Dept>38</Dept>
XML2CLOB
Converts XML input to a CLOB value. If the input is null, the output is null.
XML2CLOB
xmlagg function
xmlelement-fucntion
xmlforest-fucntion
xmlconcat-fucntion
164
XML Functions
WARNING: The XML2CLOB function is obsolete. Do not use. Use the XMLSERIALIZE
function instead.
XMLAGG
Concatenates (vertically) a set of XML data, and returns a (transient) value of type XML. If
the input is null, the output is null.
XMLAGG (
xmlelement-fucntion
,
ORDER BY
sort-exp.
ASC
DESC
FROM
WHERE
AND
GROUP
ORDER
dept AS dp
,XMLSERIALIZE(CONTENT
XMLAGG(
XMLELEMENT(NAME "Nm",
ORDER BY id)
AS CHAR(40)) AS xmldata
staff
dept < 30
id
< 80
BY dept
BY dept;
name)
ANSWER
==================================
DP XMLDATA
-- ------------------------------15 <Nm>Hanes</Nm><Nm>Rothman</Nm>
20 <Nm>Sanders</Nm><Nm>Pernal</Nm>
Concatenates (horizontally) one or more XML elements. The output is of type XML.
,
XMLCONCAT
id
,XMLSERIALIZE(CONTENT
XMLCONCAT(
XMLELEMENT(NAME "dp", dept)
,XMLELEMENT(NAME "nm", name)
)
AS CHAR(40)) AS xmldata
ANSWER
FROM
staff
==============================
WHERE
dept < 30
ID XMLDATA
AND
id
< 70
-- --------------------------ORDER BY id;
10 <dp>20</dp><nm>Sanders</nm>
20 <dp>20</dp><nm>Pernal</nm>
50 <dp>15</dp><nm>Hanes</nm>
Generates a (transient) XML output value from one or more input arguments. The function
has the following components:
One or more input items. Null values are converted to a zero-length string.
XML Functions
165
Graeme Birchall
XMLELEMENT
NAME
element-name
)
xmlattributes-function
,
,
,
element-content
element-content
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "staff"
,XMLELEMENT(NAME "nm", name)
,XMLELEMENT(NAME "sc", salary, +, comm)
)
AS CHAR(90)) AS xmldata
FROM
staff
WHERE
dept < 30
AND
id
< 60
ORDER BY id;
ANSWER
========================================================
<staff><nm>Sanders</nm><sc>18357.50+</sc></staff>
<staff><nm>Pernal</nm><sc>18171.25+00612.45</sc></staff>
attribute-value
AS
attribute-name
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp",
XMLATTRIBUTES(name AS "Nm", dept)
)
AS VARCHAR(100)) AS xmldata
FROM
staff
ANSWER
WHERE
dept < 30
==================================
AND
id
< 60
<Emp Nm="Hanes" DEPT="15"></Emp>
ORDER BY dept
<Emp Nm="Pernal" DEPT="20"></Emp>
,name;
<Emp Nm="Sanders" DEPT="20"></Emp>
Constructs a sequence (forest) of XML elements from the arguments. Null input arguments
are ignored. The result is an XML element.
XMLFOREST
,
(
element-content
xmlnamespaces-ftn
AS
element-nm
166
XML Functions
SELECT
XMLSERIALIZE(CONTENT
XMLFOREST(name AS "Nm", dept AS "dp", comm)
AS VARCHAR(100)) AS xmldata
FROM
staff
WHERE
id IN (10,20)
ORDER BY id DESC;
ANSWER
===============================================
<Nm>Pernal</Nm><dp>20</dp><COMM>00612.45</COMM>
<Nm>Sanders</Nm><dp>20</dp>
Constructs XML namespace declarations from the arguments. An XML namespace is one or
more URL references that are associated with an XML name. The name itself is specified in
the XMLELEMENT or XMLFOREST definition which the XMLNAMESPACES function is
embedded within.
,
XMLNAMESPACES
namespace-uri
DEFAULT
AS
namespace-prefix
namespace-uri
NO DEFAULT
FROM
WHERE
XMLSERIALIZE(CONTENT
XMLFOREST(
XMLNAMESPACES(DEFAULT http:\t1.com
,
http:\t2.com AS "t2"
,
http:\t3.com AS "t3")
,name AS "nm", salary AS "sal")
AS VARCHAR(300)) AS xmldata
staff
id = 20;
ANSWER (line breaks/indentation added)
===========================================
<nm xmlns="http:\t1.com"
xmlns:t2="http:\t2.com"
xmlns:t3="http:\t3.com">Pernal</nm>
<sal xmlns="http:\t1.com"
xmlns:t2="http:\t2.com"
xmlns:t3="http:\t3.com">18171.25</sal>
Below is our original query (see figure 440 on page 163) that selects some basic data:
SELECT
dept
,name
,comm
FROM
staff
WHERE
dept < 30
AND
id
< 100
ORDER BY dept
,name;
ANSWER
====================
DEPT NAME
COMM
---- ------- ------15 Hanes
15 Rothman 1152.00
20 James
128.20
20 Pernal
612.45
20 Sanders
-
XML Functions
167
Graeme Birchall
SELECT
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp",
XMLELEMENT(NAME "Dept", dept),
XMLELEMENT(NAME "Name", name),
XMLELEMENT(NAME "Comm", comm)
)
AS VARCHAR(100))
FROM
staff
WHERE
dept < 30
AND
id
< 100
ORDER BY dept
,name;
ANSWER
===================================================================
<Emp><Dept>15</Dept><Name>Hanes</Name><Comm></Comm></Emp>
<Emp><Dept>15</Dept><Name>Rothman</Name><Comm>01152.00</Comm></Emp>
<Emp><Dept>20</Dept><Name>James</Name><Comm>00128.20</Comm></Emp>
<Emp><Dept>20</Dept><Name>Pernal</Name><Comm>00612.45</Comm></Emp>
<Emp><Dept>20</Dept><Name>Sanders</Name><Comm></Comm></Emp>
For each column, convert the XML and provide a name (in double-quotes).
Generate a combined XML element (called "Emp") for each row of data.
Below is another variation of the above query that makes the employee name an attribute of
the "Emp" XML element:
SELECT
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp",
XMLATTRIBUTES(name AS "Name"),
XMLELEMENT(NAME "Dept", dept),
XMLELEMENT(NAME "Comm", comm)
)
AS VARCHAR(100))
FROM
staff
WHERE
dept < 30
AND
id
< 100
ORDER BY dept
,name;
ANSWER
==============================================================
<Emp Name="Hanes"><Dept>15</Dept><Comm></Comm></Emp>
<Emp Name="Rothman"><Dept>15</Dept><Comm>01152.00</Comm></Emp>
<Emp Name="James"><Dept>20</Dept><Comm>00128.20</Comm></Emp>
<Emp Name="Pernal"><Dept>20</Dept><Comm>00612.45</Comm></Emp>
<Emp Name="Sanders"><Dept>20</Dept><Comm></Comm></Emp>
The next query illustrates how XMLELEMENT converts various DB2 data types:
168
XML Functions
SELECT
FROM
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Data",
XMLELEMENT(NAME "Chr1", CHAR
(c1,3)),
XMLELEMENT(NAME "Chr2", CHAR
(c1,5)),
XMLELEMENT(NAME "VChr", VARCHAR(c1,5)),
XMLELEMENT(NAME "Dec1", DECIMAL(n1,7,2)),
XMLELEMENT(NAME "Dec2", DECIMAL(n2,9,1)),
XMLELEMENT(NAME "Flt1", FLOAT (n2)),
XMLELEMENT(NAME "Int1", INTEGER(n1)),
XMLELEMENT(NAME "Int2", INTEGER(n2)),
XMLELEMENT(NAME "Time", TIME
(t1)),
XMLELEMENT(NAME "Date", DATE
(t1)),
XMLELEMENT(NAME "Ts" , TIMESTAMP(t1))
)
AS VARCHAR(300)) AS xmldata
(SELECT
ABC
AS c1
,1234.56
AS n1
,1234567
AS n2
,TIMESTAMP(2004-09-14-22.33.44.123456) AS t1
FROM
staff
WHERE
id = 10
)AS xxx;
ANSWER (line-breaks/indentation added)
======================================
<Data>
<Chr1>ABC</Chr1>
<Chr2>ABC </Chr2>
<VChr>ABC</VChr>
<Dec1>01234.56</Dec1>
<Dec2>01234567.0</Dec2>
<Flt1>1.234567E6</Flt1>
<Int1>1234</Int1>
<Int2>1234567</Int2>
<Time>22:33:44</Time>
<Date>2004-09-14</Date>
<Ts>2004-09-14T22:33:44.123456</Ts>
</Data>
Character columns, which are displayed to their defined length using trailing blanks.
Decimal columns, which are given leading and trailing zeros - up to their defined size.
XML Functions
169
Graeme Birchall
SELECT
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp", dept, name, comm)
AS CHAR(50)) AS outdata
FROM
staff
ANSWER
WHERE
dept < 30
===========================
AND
id
< 100
<Emp>15Hanes</Emp>
ORDER BY dept
<Emp>15Rothman01152.00</Emp
,name;
<Emp>20James00128.20</Emp>
<Emp>20Pernal00612.45</Emp>
<Emp>20Sanders</Emp>
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp", CHAR(dept) || name || CHAR(comm))
AS CHAR(50)) AS outdata
FROM
staff
WHERE
dept < 30
ANSWER
AND
id
< 100
=================================
ORDER BY dept
<Emp></Emp>
,name;
<Emp>15
Rothman01152.00 </Emp>
<Emp>20
James00128.20 </Emp>
<Emp>20
Pernal00612.45 </Emp>
<Emp></Emp>
In the query below the employee name is listed as an attribute, while the dept-number and
commission are treated as elements:
SELECT
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp",
XMLATTRIBUTES(name), dept, comm)
)
AS CHAR(100)) AS xmldata
FROM
staff
WHERE
dept < 30
ANSWER
AND
id
< 100
====================================
ORDER BY dept
<Emp NAME="Hanes">15</Emp>
,name;
<Emp NAME="Rothman">1501152.00</Emp>
<Emp NAME="James">2000128.20</Emp>
<Emp NAME="Pernal">2000612.45</Emp>
<Emp NAME="Sanders">20</Emp>
170
XML Functions
SELECT
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp",
XMLATTRIBUTES(name, dept, comm)
)
AS VARCHAR(100)) AS xmldata
FROM
staff
WHERE
dept < 30
AND
id
< 100
ORDER BY dept
ANSWER
,name;
====================================================
<Emp NAME="Hanes" DEPT="15"></Emp>
<Emp NAME="Rothman" DEPT="15" COMM="01152.00"></Emp>
<Emp NAME="James" DEPT="20" COMM="00128.20"></Emp>
<Emp NAME="Pernal" DEPT="20" COMM="00612.45"></Emp>
<Emp NAME="Sanders" DEPT="20"></Emp>
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Emp",
XMLATTRIBUTES(name AS "Nm", dept AS "Dpt", comm)
)
AS VARCHAR(100)) AS xmldata
FROM
staff
WHERE
dept < 30
ANSWER
AND
id
< 100 =================================================
ORDER BY dept
<Emp Nm="Hanes" Dpt="15"></Emp>
,name;
<Emp Nm="Rothman" Dpt="15" COMM="01152.00"></Emp>
<Emp Nm="James" Dpt="20" COMM="00128.20"></Emp>
<Emp Nm="Pernal" Dpt="20" COMM="00612.45"></Emp>
<Emp Nm="Sanders" Dpt="20"></Emp>
In our sample data there multiple employees per department. We can use the XMLAGG function to structure our XML output so that the name and commission elements are within an
employee element, and the employees are within a department element. This way, we dont
have to repeat the department value:
SELECT
XMLSERIALIZE(CONTENT
XMLELEMENT(NAME "Dpt",
XMLATTRIBUTES(dept),
XMLAGG(
XMLELEMENT(NAME "Emp",
XMLELEMENT(NAME "Nm", name),
XMLELEMENT(NAME "Cm", comm))
ORDER BY id)
)
AS VARCHAR(300)) AS xmldata
FROM
staff
WHERE
dept < 30
AND
id
< 100
GROUP BY dept;
ANSWER (line-breaks/indentation added)
===============================================
<Dpt DEPT="15">
<Emp><Nm>Hanes</Nm><Cm></Cm></Emp>
<Emp><Nm>Rothman</Nm><Cm>01152.00</Cm></Emp>
</Dpt>
<Dpt DEPT="20">
<Emp><Nm>Sanders</Nm><Cm></Cm></Emp>
<Emp><Nm>Pernal</Nm><Cm>00612.45</Cm></Emp>
<Emp><Nm>James</Nm><Cm>00128.20</Cm></Emp>
</Dpt>
XML Functions
171
Graeme Birchall
As the above query illustrates, the XMLAGG function is used within a GROUP BY statement. The field that one is grouping on is defined as an attribute - outside the aggregation.
The field, or fields, that are being grouped are defined as elements or attributes within the
aggregation.
REC2XML Function
The REC2XML function accepts as input a set of columns and returns as output a string that
has the column names and data wrapped in XML tags.
REC2XML
decimal-value
,
column-name
format-type
row-tag-string
The first input parameter is a decimal number ranging from 0.0 to 6.0. This is an estimate
of how much longer (than the default) the output string has to be defined in order to hold
all the data. A number larger than 1.0 is needed if the input data has many characters that
are used in XML like "<", which has to be converted to "<" in the output.
The third input parameter is the value used to identify each row. The default is row.
The fourth and subsequent input parameters is the list of columns to be used.
The query below illustrates the replacement of characters that are used by XML into equivalent output values:
WITH temp1 (indata) AS
(VALUES (<txt)
,(txt>)
,(&txt)
,("txt)
,(txt))
SELECT
indata
,REC2XML (1.0, COLATTVAL, row, indata) AS outdata
FROM
temp1;
ANSWER
==========================================================
INDATA OUTDATA
------ --------------------------------------------------<txt
<row><column name="INDATA"><txt</column></row>
txt>
<row><column name="INDATA">txt></column></row>
&txt
<row><column name="INDATA">&txt</column></row>
"txt
<row><column name="INDATA">"txt</column></row>
txt
<row><column name="INDATA">'txt</column></row>
172
XML Functions
There are several differences between the REC2XML and XMLELEMENT functions:
The REC2XML function converts a single quote to "'", while the XMLELEMENT
function leaves as is. All other characters used by XML are treated the same.
The REC2XML function converts a null value to the output string: "null="true"", while
the XMLELEMENT function converts the same to a zero-length string.
The XMLELEMENT function can be used with the XMLAGG function to aggregate the
values in a GROUP BY list. The REC2XML function has no equivalent capability.
Sample Query
The next query uses the REC2XML function to convert the selected list of fields and rows
into standard XML output. Because the COLATTVAL option is specified, the quote in the
name "OBrien" will be converted to """:
SELECT
FROM
WHERE
ORDER BY
XML Functions
173
Graeme Birchall
174
XML Functions
External scalar functions use an external process (e.g. a C program), and possibly also an
external data source, to return a single value.
External table functions use an external process, and possibly also an external data
source, to return a set of rows and columns.
Internal scalar functions use compound SQL code to return a single value.
Internal table functions use compound SQL code to return a set of rows and columns
This chapter will briefly go over the last three types of function listed above. See the official
DB2 documentation for more details.
WARNING: As of the time of writing, there is a known bug in DB2 that causes the prepare
cost of a dynamic SQL statement to go up exponentially when a user defined function that
is written in the SQL language is referred to multiple times in a single SQL statement.
Sourced Functions
A sourced function is used to redefine an existing DB2 function so as to in some way restrict
or enhance its applicability. Below is the basic syntax:
CREATE FUNCTION
function-name
,
parm-name
RETURNS
data-type
SPECIFIC
SOURCE
data-type
specific-name
function-name
SPECIFIC
specific-name
function-name
,
data-type
175
Graeme Birchall
SELECT
id
,DIGITS(id)
,digi_int(id)
FROM
staff
WHERE
id < 40
ORDER BY id;
AS ID
AS I2
AS I3
ANSWER
==============
ID I2
I3
-- ----- ----10 00010 00010
20 00020 00020
30 00030 00030
id
,digi_int(INT(id))
staff
id < 50;
ANSWER
=======
<error>
NOT NULL
NOT NULL);
ANSWER
==========
ID balance
-- ------1 111.11
2 222.22
id
,balance * 10
FROM
customers
ORDER BY id;
ANSWER
=======
<error>
id
,balance * 10 AS newbal
FROM
customers
ORDER BY id;
ANSWER
==========
ID NEWBAL
-- ------1 1111.10
2 2222.20
176
Sourced Functions
SELECT
id
,"*"(balance,10) AS newbal
FROM
customers
ORDER BY id;
ANSWER
==========
ID NEWBAL
-- ------1 1111.10
2 2222.20
Scalar Functions
A scalar function has as input a specific number of values (i.e. not a table) and returns a single
output item. Here is the syntax (also for table function):
CREATE FUNCTION
function-name
,
parm-name
RETURNS
data-type
TABLE
LANGUAGE SQL
data-type
,
(
column-name
column-type
NOT DETERMINISTIC
EXTERNAL ACTION
DETERMINISTIC
NO EXTERNAL ACTION
STATIC DISPATCH
CONTAINS SQL
PREDICATES
RETURN
predicate-list
value
NULL
full-select
,
WITH
common-table-expression
FUNCTION NAME: A qualified or unqualified name, that along with the number and
type of parameters, uniquely identifies the function.
RETURNS: The type of value returned, if a scalar function. For a table function, the list
of columns, with their type.
LANGUAGE SQL: This the default, and the only one that is supported.
DETERMINISTIC: Specifies whether the function always returns the same result for a
given input. For example, a function that multiplies the input number by ten is deterministic, whereas a function that gets the current timestamp is not. The optimizer needs to
know this information.
EXTERNAL ACTION: Whether the function takes some action, or changes some object
that is not under the control of DB2. The optimizer needs to know this information.
177
Graeme Birchall
READS SQL DATA: Whether the function reads SQL data only, or doesnt even do that.
The function cannot modify any DB2 data, except via an external procedure call.
STATIC DISPATCH: At function resolution time, DB2 chooses the function to run
based on the parameters of the function.
CALLED ON NULL INPUT: The function is called, even when the input is null.
PREDICATES: For predicates using this function, this clause lists those that can use the
index extensions. If this clause is specified, function must also be DETERMINISTIC
with NO EXTERNAL ACTION. See the DB2 documentation for details.
One can have multiple scalar functions with the same name and different input/output data
types, but not with the same name and input/output types, but with different lengths. So if one
wants to support all possible input/output lengths for, say, varchar data, one has to define the
input and output lengths to be the maximum allowed for the field type.
For varchar input, one would need an output length of 32,672 bytes to support all possible
input values. But this is a problem, because it is very close to the maximum allowable table
(row) length in DB2, which is 32,677 bytes.
Decimal field types are even more problematic, because one needs to define both a length and
a scale. To illustrate, imagine that one defines the input as being of type decimal(31,12). The
following input values would be treated thus:
In addition to the examples shown in this section, there are also the following:
Pause SQL statement (by looping) for "n" seconds - page 351.
id
,returns_zero()
staff
id = 10;
AS id
AS zz
ANSWER
======
ID ZZ
-- -10 0
178
Scalar Functions
Two functions can be created with the same name. Which one is used depends on the input
type that is provided:
CREATE FUNCTION calc(inval SMALLINT) RETURNS INT RETURN inval * 10;
CREATE FUNCTION calc(inval INTEGER) RETURNS INT RETURN inval * 5;
SELECT
id
AS id
,calc(SMALLINT(id)) AS c1
,calc(INTEGER (id)) AS C2
FROM
staff
WHERE
id < 30
ORDER BY id;
ANSWER
==========
ID C1 C2
-- --- --10 100 50
20 200 100
id
AS id
,rnd(1) AS RND
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
======
ID RND
-- --10 37
20
8
30 42
id
AS id
,get_sal(id) AS salary
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
===========
ID SALARY
-- -------10 18357.50
20 18171.25
30 17506.75
179
Graeme Birchall
id
AS id
,salary
AS SAL1
,max_sal(id) AS SAL2
FROM
staff
WHERE
id < 40
ORDER BY id;
ANSWER
====================
ID SAL1
SAL2
-- -------- -------10 18357.50 22959.20
20 18171.25 18357.50
30 17506.75 19260.25
staff
name = remove_e(name)
id < 40;
Below is an example of a scalar function that uses compound SQL to reverse the contents of a
text string:
180
Scalar Functions
--#SET DELIMITER !
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
If the length of the input string is less than 7, the result is set to -1.
id
AS id
,name
AS name1
,check_len(name) AS name2
FROM
staff
WHERE
id < 60
ORDER BY id!
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
ANSWER
=================
ID NAME1
NAME2
-- -------- ----10 Sanders
7
20 Pernal
-1
30 Marenghi
8
40 OBrien
7
<error>
181
Graeme Birchall
Table Functions
A table function is very similar to a scalar function, except that it returns a set of rows and
columns, rather than a single value. Here is an example:
CREATE FUNCTION get_staff()
RETURNS TABLE (ID
SMALLINT
,name
VARCHAR(9)
,YR
SMALLINT)
RETURN SELECT id
,name
,years
FROM
staff;
SELECT
FROM
WHERE
ORDER BY
ANSWER
==============
ID NAME
YR
-- -------- -10 Sanders
7
20 Pernal
8
30 Marenghi 5
*
TABLE(get_staff()) AS s
id < 40
id;
Description
The basic syntax for selecting from a table function goes as follows:
FROM
AS
TABLE
function-name
input-parmeter
correlation-name
(
,
column-name
The TABLE keyword, the function name (obviously), the two sets of parenthesis , and a
correlation name, are all required.
If the function has input parameters, they are all required, and their type must match.
Optionally, one can list all of the columns that are returned by the function, giving each
an assigned name
*
TABLE(get_st(30)) AS sss (id, nnn, yy);
ANSWER
==============
ID NNN
YY
-- -------- -30 Marenghi 5
182
Table Functions
Examples
A table function returns a table, but it doesnt have to touch a table. To illustrate, the following function creates the data on the fly:
CREATE FUNCTION make_data()
RETURNS TABLE (KY
SMALLINT
,DAT CHAR(5))
RETURN WITH temp1 (k#) AS (VALUES (1),(2),(3))
SELECT k#
,DIGITS(SMALLINT(k#))
FROM
temp1;
SELECT
FROM
*
TABLE(make_data()) AS ttt;
ANSWER
========
KY DAT
-- ----1 00001
2 00002
3 00003
183
Graeme Birchall
184
Table Functions
column name
column#
DESC
expression
ORDER BY
table-designator
INPUT SEQUENCE
For an insert, the order in which the rows were inserted (see page 55).
Also note:
One can have multiple ORDER BY statements in a query, but only one per sub-select.
Specifying the same field multiple times in an ORDER BY list is allowed, but silly. Only
the first specification of the field will have any impact on the output order.
If the ORDER BY column list does not uniquely identify each row, any rows with duplicate values will come out in random order. This is almost always the wrong thing to do
when the data is being displayed to an end-user.
Use the TRANSLATE function to order data regardless of case. Note that this trick may
not work consistently with some European character sets.
Sample Data
185
Graeme Birchall
Order by Examples
col1
,col2
FROM
seq_data
ORDER BY col1 ASC
,col2;
ANSWER
=========
COL1 COL2
---- ---ab
xy
ac
XY
Ab
12
AB
xy
AB
XY
SEQ_DATA
+---------+
|COL1|COL2|
|----+----|
|ab |xy |
|AB |xy |
|ac |XY |
|AB |XY |
|Ab |12 |
+---------+
col1
,col2
FROM
seq_data
ORDER BY TRANSLATE(col1) ASC
,TRANSLATE(col2) ASC
ANSWER
=========
COL1 COL2
---- ---Ab
12
ab
xy
AB
XY
AB
xy
ac
XY
ANSWER
======
COL2
---xy
XY
12
xy
XY
col1
,col2
FROM
seq_data
ORDER BY SUBSTR(col1,2) DESC
,col2
,1;
ANSWER
=========
COL1 COL2
---- ---ac
XY
AB
xy
AB
XY
Ab
12
ab
xy
186
Order By
SELECT
col1
,HEX(col1) AS hex1
,col2
,HEX(col2) AS hex2
FROM
seq_data
ORDER BY HEX(col1)
,HEX(col2)
ANSWER
===================
COL1 HEX1 COL2 HEX2
---- ---- ---- ---AB
4142 XY
5859
AB
4142 xy
7879
Ab
4162 12
3132
ab
6162 xy
7879
ac
6163 XY
5859
One can order by the result of a nested ORDER BY, thus enabling one to order by a column
that is not in the input - as is done below:
SELECT
FROM
col1
(SELECT
FROM
ORDER BY
) AS xxx
ORDER BY ORDER OF
col1
seq_data
col2
xxx;
ANSWER
======
COL1
---Ab
ab
AB
ac
AB
SEQ_DATA
+---------+
|COL1|COL2|
|----+----|
|ab |xy |
|AB |xy |
|ac |XY |
|AB |XY |
|Ab |12 |
+---------+
*
(SELECT
FROM
*
(SELECT
*
FROM
seq_data
ORDER BY col2
)AS xxx
ORDER BY ORDER OF xxx
,SUBSTR(col1,2)
)AS yyy
ORDER BY ORDER OF yyy
,col1;
ANSWER
=========
COL1 COL2
---- ---Ab
12
ab
xy
AB
xy
AB
XY
ac
XY
One can select from an insert statement (see page 55) to see what was inserted. Order by the
INSERT SEQUENCE to display the rows in the order that they were inserted:
SELECT
empno
,projno AS prj
,actno AS act
,ROW_NUMBER() OVER() AS r#
FROM
FINAL TABLE
(INSERT INTO emp_act (empno, projno, actno)
VALUES (400000,ZZZ,999)
,(400000,VVV,111))
ORDER BY INPUT SEQUENCE;
ANSWER
=================
EMPNO PRJ ACT R#
------ --- --- -400000 ZZZ 999 1
400000 VVV 111 2
187
Graeme Birchall
expression
,
GROUPING SETS
expression
,
ROLLUP (
expression
(
,
expression
)
)
,
CUBE
expression
(
(
HAVING
,
expression
)
)
search-condition(s)
There can only be one GROUP BY per SELECT. Multiple select statements in the same
query can each have their own GROUP BY.
Every field in the SELECT list must either be specified in the GROUP BY, or must have
a column function applied against it.
The result of a simple GROUP BY is always a distinct set of rows, where the unique
identifier is whatever fields were grouped on.
Only expressions returning constant values (e.g. a column name, a constant) can be referenced in a GROUP BY. For example, one cannot group on the RAND function as its result varies from one call to the next. To reference such a value in a GROUP BY, resolve
it beforehand using a nested-table-expression.
Variable length character fields with differing numbers on trailing blanks are treated as
equal in the GROUP. The number of trailing blanks, if any, in the result is unpredictable.
When grouping, all null values in the GROUP BY fields are considered equal.
There is no guarantee that the rows resulting from a GROUP BY will come back in any
particular order. If this is a problem, use an ORDER BY.
188
GROUP BY Flavors
A typical GROUP BY that encompasses one or more fields is actually a subset of the more
general GROUPING SETS command. In a grouping set, one can do the following:
Summarize the selected data by the items listed such that one row is returned per unique
combination of values. This is an ordinary GROUP BY.
Summarize the selected data using multiple independent fields. This is equivalent to doing multiple independent GROUP BY statements - with the separate results combined
into one using UNION ALL statements.
Summarize the selected data by the items listed such that one row is returned per unique
combination of values, and also get various sub-totals, plus a grand-total. Depending on
what exactly is wanted, this statement can be written as a ROLLUP, or a CUBE.
To illustrate the above concepts, imagine that we want to group some company data by team,
department, and division. The possible sub-totals and totals that we might want to get are:
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
BY
BY
BY
BY
BY
BY
BY
BY
Single vs. double parenthesis is a very big deal in grouping sets. When using the former,
one is listing multiple independent groupings, while with the latter one is listing the set of
items in a particular grouping.
189
Graeme Birchall
AS
AS
AS
AS
d1
dept
sex
salary
SELECT
*
FROM
employee_view
ORDER BY 1,2,3,4;
ANSWER
==================
D1 DEPT SEX SALARY
-- ---- --- -----A A00 F
52750
A A00 M
29250
A A00 M
46500
B B01 M
41250
C C01 F
23800
C C01 F
28420
C C01 F
38250
D D11 F
21340
D D11 F
22250
D D11 F
29840
D D11 M
18270
D D11 M
20450
D D11 M
24680
D D11 M
25280
D D11 M
27740
D D11 M
32250
A simple GROUP BY is used to combine individual rows into a distinct set of summary rows.
Sample Queries
In this first query we group our sample data by the leftmost three fields in the view:
SELECT
ANSWER
========================
D1 DEPT SEX SALARY #ROWS
-- ---- --- ------ ----A A00 F
52750
1
A A00 M
75750
2
B B01 M
41250
1
C C01 F
90470
3
D D11 F
73430
3
D D11 M
148670
6
sex
,SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
FROM
employee_view
WHERE
sex IN (F,M)
GROUP BY dept
,sex
ORDER BY sex;
ANSWER
================
SEX SALARY #ROWS
--- ------ ----F
52750
1
F
90470
3
F
73430
3
M
75750
2
M
41250
1
M
148670
6
190
SELECT
SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
FROM
employee_view
WHERE
d1 <> X
GROUP BY SUBSTR(dept,3,1)
HAVING
COUNT(*) <> 99;
ANSWER
============
SALARY #ROWS
------ ----128500
3
353820
13
SUBSTR(dept,3,1)
AS wpart
,SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
FROM
employee_view
GROUP BY SUBSTR(dept,3,1)
ORDER BY wpart DESC;
ANSWER
==================
WPART SALARY #ROWS
----- ------ ----1
353820
13
0
128500
3
The GROUPING SETS statement enable one to get multiple GROUP BY result sets using a
single statement. It is important to understand the difference between nested (i.e. in secondary
parenthesis), and non-nested GROUPING SETS sub-phrases:
A non-nested list of columns works as separate simple GROUP BY statements, which are
then combined in an implied UNION ALL.
GROUP BY GROUPING SETS ((A,B,C))
is equivalent to
GROUP BY A
,B
,C
is equivalent to
GROUP
UNION
GROUP
UNION
GROUP
is equivalent to
GROUP BY A
UNION ALL
GROUP BY B
,BY C
BY A
ALL
BY B
ALL
BY C
is equivalent to
GROUP BY A
,B
,C
is equivalent to
GROUP BY A
,B
,C
is equivalent to
GROUP BY A
,B
UNION ALL
GROUP BY A
,C
191
Graeme Birchall
One can mix simple expressions and GROUPING SETS in the same GROUP BY:
GROUP BY A
,GROUPING SETS ((B,C))
is equivalent to
GROUP BY A
,B
,C
is equivalent to
GROUP BY A
,B
,C
GROUP BY A
,B
,GROUPING SETS (B,C)
is equivalent to
GROUP BY A
,B
,C
UNION ALL
GROUP BY A
,B
GROUP BY A
,B
,C
,GROUPING SETS (B,C)
is equivalent to
GROUP BY A
,B
,C
UNION ALL
GROUP BY A
,B
,C
is equivalent to
GROUP BY A
,B
,C
UNION ALL
GROUP BY A
,B
UNION ALL
GROUP BY C
is equivalent to
GROUP BY A
UNION ALL
GROUP BY B
,C
UNION ALL
GROUP BY A
UNION ALL
GROUP BY A
UNION ALL
GROUP BY C
192
is equivalent to
is equivalent to
ROLLUP(A,B,C)
GROUP BY A
,B
,C
UNION ALL
GROUP BY A
,B
UNION ALL
GROUP BY A
UNION ALL
grand-totl
is equivalent to
is equivalent to
CUBE(A,B,C)
GROUP BY A
,B
,C
UNION ALL
GROUP BY A
,B
UNION ALL
GROUP BY A
,C
UNION ALL
GROUP BY B
,C
UNION ALL
GROUP BY A
UNION ALL
GROUP BY B
UNION ALL
GROUP BY C
UNION ALL
grand-totl
This first example has two GROUPING SETS. Because the second is in nested parenthesis,
the result is the same as a simple three-field group by:
SELECT
d1
,dept
,sex
,SUM(salary)
AS sal
,SMALLINT(COUNT(*)) AS #r
,GROUPING(d1)
AS f1
,GROUPING(dept)
AS fd
,GROUPING(sex)
AS fs
FROM
employee_view
GROUP BY GROUPING SETS (d1)
,GROUPING SETS ((dept,sex))
ORDER BY d1
,dept
,sex;
ANSWER
==============================
D1 DEPT SEX
SAL #R DF WF SF
-- ---- --- ------ -- -- -- -A A00 F
52750 1 0 0 0
A A00 M
75750 2 0 0 0
B B01 M
41250 1 0 0 0
C C01 F
90470 3 0 0 0
D D11 F
73430 3 0 0 0
D D11 M
148670 6 0 0 0
In the next query, the second GROUPING SET is not in nested-parenthesis. The query is
therefore equivalent to GROUP BY D1, DEPT UNION ALL GROUP BY D1, SEX:
193
Graeme Birchall
SELECT
d1
,dept
,sex
,SUM(salary)
AS sal
,SMALLINT(COUNT(*)) AS #r
,GROUPING(d1)
AS f1
,GROUPING(dept)
AS fd
,GROUPING(sex)
AS fs
FROM
employee_view
GROUP BY GROUPING SETS (d1)
,GROUPING SETS (dept,sex)
ORDER BY d1
,dept
,sex;
ANSWER
==============================
D1 DEPT SEX
SAL #R F1 FD FS
-- ---- --- ------ -- -- -- -A
A00 128500 3 0 0 1
A
F
52750 1 0 1 0
A
M
75750 2 0 1 0
B
B01 41250 1 0 0 1
B
M
41250 1 0 1 0
C
C01 90470 3 0 0 1
C
F
90470 3 0 1 0
D
D11 222100 9 0 0 1
D
F
73430 3 0 1 0
D
M
148670 6 0 1 0
In the first, the repetition is ignored, because what is created is an ordinary GROUP BY
on all three fields.
In the second, repetition is important, because two GROUP BY statements are implicitly
generated. The first is on D1 and DEPT. The second is on D1, DEPT, and SEX.
SELECT
d1
,dept
,sex
,SUM(salary)
AS sal
,SMALLINT(COUNT(*)) AS #r
,GROUPING(d1)
AS f1
,GROUPING(dept)
AS fd
,GROUPING(sex)
AS fs
FROM
employee_view
GROUP BY d1
,dept
,GROUPING SETS ((dept,sex))
ORDER BY d1
,dept
,sex;
ANSWER
==============================
D1 DEPT SEX SAL
#R F1 FD FS
-----------------------------A A00 F
52750 1 0 0 0
A A00 M
75750 2 0 0 0
B B01 M
41250 1 0 0 0
C C01 F
90470 3 0 0 0
D D11 F
73430 3 0 0 0
D D11 M
148670 6 0 0 0
d1
,dept
,sex
,SUM(salary)
AS sal
,SMALLINT(COUNT(*)) AS #r
,GROUPING(d1)
AS f1
,GROUPING(dept)
AS fd
,GROUPING(sex)
AS fs
FROM
employee_view
GROUP BY d1
,DEPT
,GROUPING SETS (dept,sex)
ORDER BY d1
,dept
,sex;
ANSWER
==============================
D1 DEPT SEX SAL
#R F1 FD FS
-----------------------------A A00 F
52750 1 0 0 0
A A00 M
75750 2 0 0 0
A A00 128500 3 0 0 1
B B01 M
41250 1 0 0 0
B B01 41250 1 0 0 1
C C01 F
90470 3 0 0 0
C C01 90470 3 0 0 1
D D11 F
73430 3 0 0 0
D D11 M
148670 6 0 0 0
D D11 222100 9 0 0 1
194
GROUP BY d1
,dept
,GROUPING SETS ((dept,sex))
is equivalent to
GROUP BY d1
,dept
sex
GROUP BY d1
,dept
,GROUPING SETS (dept,sex)
is equivalent to
GROUP BY d1
,dept
sex
UNION ALL
GROUP BY d1
,dept
,dept
ROLLUP Statement
A ROLLUP expression displays sub-totals for the specified fields. This is equivalent to doing
the original GROUP BY, and also doing more groupings on sets of the left-most columns.
GROUP BY ROLLUP(A,B,C)
===>
GROUP BY ROLLUP(C,B)
===>
GROUP BY ROLLUP(A)
===>
===>
===>
ROLLUP(B,C)
GROUPING SETS((A)
,())
GROUPING SETS((B,C)
,(B)
())
GROUPING SETS((A,B,C)
,(A,B)
,(A)
,(B,C)
,(B)
,(())
195
Graeme Birchall
SQL Examples
dept
,SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
,GROUPING(dept)
AS fd
FROM
employee_view
GROUP BY dept
ORDER BY dept;
ANSWER
====================
DEPT SALARY #ROWS FD
---- ------ ----- -A00 128500
3 0
B01
41250
1 0
C01
90470
3 0
D11 222100
9 0
dept
,SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
,GROUPING(dept)
AS FD
FROM
employee_view
GROUP BY ROLLUP(dept)
ORDER BY dept;
ANSWER
====================
DEPT SALARY #ROWS FD
---- ------ ----- -A00 128500
3 0
B01
41250
1 0
C01
90470
3 0
D11 222100
9 0
482320
16 1
Alternatively, we could do things the old-fashioned way and use a UNION ALL to combine
the original GROUP BY with an all-row summary:
SELECT
dept
,SUM(salary)
,SMALLINT(COUNT(*))
,GROUPING(dept)
FROM
employee_view
GROUP BY dept
UNION ALL
SELECT
CAST(NULL AS CHAR(3))
,SUM(salary)
,SMALLINT(COUNT(*))
,CAST(1 AS INTEGER)
FROM
employee_view
ORDER BY dept;
AS salary
AS #rows
AS fd
AS
AS
AS
AS
dept
salary
#rows
fd
ANSWER
====================
DEPT SALARY #ROWS FD
---- ------ ----- -A00 128500
3 0
B01
41250
1 0
C01
90470
3 0
D11 222100
9 0
482320
16 1
dept
,SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
,GROUPING(dept)
AS fd
FROM
employee_view
GROUP BY dept
,ROLLUP(dept)
ORDER BY dept;
ANSWER
====================
DEPT SALARY #ROWS FD
---- ------ ----- -A00 128500
3 0
A00 128500
3 0
B01
41250
1 0
B01
41250
1 0
C01
90470
3 0
C01
90470
3 0
D11 222100
9 0
D11 222100
9 0
196
Below is a graphic representation of why the data rows were repeated above. Observe that
two GROUP BY statements were, in effect, generated:
GROUP BY dept
=> GROUP BY dept
=> GROUP BY dept
,ROLLUP(dept)
,GROUPING SETS((dept)
UNION ALL
,())
GROUP BY dept
,()
dept
,sex
,SUM(salary)
,SMALLINT(COUNT(*))
,GROUPING(dept)
,GROUPING(sex)
FROM
employee_view
GROUP BY dept
,ROLLUP(sex)
ORDER BY dept
,sex;
AS
AS
AS
AS
salary
#rows
fd
fs
ANSWER
===========================
DEPT SEX SALARY #ROWS FD FS
---- --- ------ ----- -- -A00 F
52750
1 0 0
A00 M
75750
2 0 0
A00 128500
3 0 1
B01 M
41250
1 0 0
B01 41250
1 0 1
C01 F
90470
3 0 0
C01 90470
3 0 1
D11 F
73430
3 0 0
D11 M
148670
6 0 0
D11 222100
9 0 1
The work-department and sex field combined (i.e. the original raw GROUP BY).
dept
,sex
,SUM(salary)
,SMALLINT(COUNT(*))
,GROUPING(dept)
,GROUPING(sex)
FROM
employee_view
GROUP BY ROLLUP(dept
,sex)
ORDER BY dept
,sex;
AS
AS
AS
AS
salary
#rows
fd
fs
ANSWER
===========================
DEPT SEX SALARY #ROWS FD FS
---- --- ------ ----- -- -A00 F
52750
1 0 0
A00 M
75750
2 0 0
A00 128500
3 0 1
B01 M
41250
1 0 0
B01 41250
1 0 1
C01 F
90470
3 0 0
C01 90470
3 0 1
D11 F
73430
3 0 0
D11 M
148670
6 0 0
D11 222100
9 0 1
482320
16 1 1
197
Graeme Birchall
SELECT
sex
,dept
,SUM(salary)
,SMALLINT(COUNT(*))
,GROUPING(dept)
,GROUPING(sex)
FROM
employee_view
GROUP BY ROLLUP(sex
,dept)
ORDER BY sex
,dept;
AS
AS
AS
AS
salary
#rows
fd
fs
ANSWER
===========================
SEX DEPT SALARY #ROWS FD FS
--- ---- ------ ----- -- -F
A00
52750
1 0 0
F
C01
90470
3 0 0
F
D11
73430
3 0 0
F
216650
7 1 0
M
A00
75750
2 0 0
M
B01
41250
1 0 0
M
D11 148670
6 0 0
M
265670
9 1 0
482320
16 1 1
sex
,dept
,SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
,GROUPING(dept)
AS fd
,GROUPING(sex)
AS fs
FROM
employee_view
GROUP BY GROUPING SETS ((sex, dept)
,(sex)
,())
ORDER BY sex
,dept;
ANSWER
===========================
SEX DEPT SALARY #ROWS FD FS
--- ---- ------ ----- -- -F
A00
52750
1 0 0
F
C01
90470
3 0 0
F
D11
73430
3 0 0
F
216650
7 1 0
M
A00
75750
2 0 0
M
B01
41250
1 0 0
M
D11 148670
6 0 0
M
265670
9 1 0
482320
16 1 1
The two together make a (single) combined summary row of all matching data. This query is
the same as a UNION of the two individual rollups, but it has the advantage of being done in
a single pass of the data. The result is the same as a CUBE of the two fields:
SELECT
sex
,dept
,SUM(salary)
,SMALLINT(COUNT(*))
,GROUPING(dept)
,GROUPING(sex)
FROM
employee_view
GROUP BY ROLLUP(sex)
,ROLLUP(dept)
ORDER BY sex
,dept;
AS
AS
AS
AS
salary
#rows
fd
fs
ANSWER
===========================
SEX DEPT SALARY #ROWS FD FS
--- ---- ------ ----- -- -F
A00
52750
1 0 0
F
C01
90470
3 0 0
F
D11
73430
3 0 0
F
216650
7 1 0
M
A00
75750
2 0 0
M
B01
41250
1 0 0
M
D11 148670
6 0 0
M
265670
9 1 0
A00 128500
3 0 1
B01
41250
1 0 1
C01
90470
3 0 1
D11 222100
9 0 1
482320
16 1 1
198
SELECT
dept
,sex
,SUM(salary)
,SMALLINT(COUNT(*))
,GROUPING(dept)
,GROUPING(sex)
FROM
employee_view
GROUP BY ROLLUP((dept,sex))
ORDER BY dept
,sex;
AS
AS
AS
AS
salary
#rows
fd
fs
ANSWER
===========================
DEPT SEX SALARY #ROWS FD FS
---- --- ------ ----- -- -A00 F
52750
1 0 0
A00 M
75750
2 0 0
B01 M
41250
1 0 0
C01 F
90470
3 0 0
D11 F
73430
3 0 0
D11 M
148670
6 0 0
482320
16 1 1
SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
FROM
employee_view
GROUP BY ROLLUP(sex
,dept)
HAVING
GROUPING(dept) = 1
AND
GROUPING(sex) = 1
ORDER BY salary;
ANSWER
============
SALARY #ROWS
------ ----482320
16
SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
FROM
employee_view
GROUP BY GROUPING SETS(());
ANSWER
============
SALARY #ROWS
------ ----482320
16
SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
FROM
employee_view
GROUP BY ();
ANSWER
============
SALARY #ROWS
------ ----482320
16
SUM(salary)
AS salary
,SMALLINT(COUNT(*)) AS #rows
employee_view;
ANSWER
============
SALARY #ROWS
------ ----482320
16
A CUBE expression displays a cross-tabulation of the sub-totals for any specified fields. As
such, it generates many more totals than the similar ROLLUP.
199
Graeme Birchall
GROUP BY CUBE(A,B,C)
===>
GROUP BY CUBE(C,B)
===>
GROUP BY CUBE(A)
===>
===>
==>
GROUPING SETS((A,B,C),(A,B),(A,B,C),(A,B)
,(A,B,C),(A,B),(A,C),(A)
,(B,C),(B),(B,C),(B)
,(B,C),(B),(C),())
200
SELECT
d1
,dept
,sex
,INT(SUM(salary))
AS
,SMALLINT(COUNT(*)) AS
,GROUPING(d1)
AS
,GROUPING(dept)
AS
,GROUPING(sex)
AS
FROM
employee_view
GROUP BY CUBE(d1, dept, sex)
ORDER BY d1
,dept
,sex;
sal
#r
f1
fd
fs
ANSWER
==============================
D1 DEPT SEX
SAL #R F1 FD FS
-- ---- --- ------ -- -- -- -A A00 F
52750 1 0 0 0
A A00 M
75750 2 0 0 0
A A00 128500 3 0 0 1
A F
52750 1 0 1 0
A M
75750 2 0 1 0
A 128500 3 0 1 1
B B01 M
41250 1 0 0 0
B B01 41250 1 0 0 1
B M
41250 1 0 1 0
B 41250 1 0 1 1
C C01 F
90470 3 0 0 0
C C01 90470 3 0 0 1
C F
90470 3 0 1 0
C 90470 3 0 1 1
D D11 F
73430 3 0 0 0
D D11 M
148670 6 0 0 0
D D11 222100 9 0 0 1
D F
73430 3 0 1 0
D M
148670 6 0 1 0
D 222100 9 0 1 1
- A00 F
52750 1 1 0 0
- A00 M
75750 2 1 0 0
- A00 128500 3 1 0 1
- B01 M
41250 1 1 0 0
- B01 41250 1 1 0 1
- C01 F
90470 3 1 0 0
- C01 90470 3 1 0 1
- D11 F
73430 3 1 0 0
- D11 M
148670 6 1 0 0
- D11 222100 9 1 0 1
- F
216650 7 1 1 0
- M
265670 9 1 1 0
- 482320 16 1 1 1
d1
,dept
,sex
,INT(SUM(salary))
AS sal
,SMALLINT(COUNT(*)) AS #r
,GROUPING(d1)
AS f1
,GROUPING(dept)
AS fd
,GROUPING(sex)
AS fs
FROM
employee_view
GROUP BY GROUPING SETS ((d1, dept, sex)
,(d1,dept)
,(d1,sex)
,(dept,sex)
,(d1)
,(dept)
,(sex)
,())
ORDER BY d1
,dept
,sex;
ANSWER
==============================
D1 DEPT SEX
SAL #R F1 FD FS
-- ---- --- ------ -- -- -- -A A00 F
52750 1 0 0 0
A A00 M
75750 2 0 0 0
etc... (same as prior query)
201
Graeme Birchall
SELECT
d1
,dept
,sex
,INT(SUM(salary))
AS
,SMALLINT(COUNT(*)) AS
,GROUPING(d1)
AS
,GROUPING(dept)
AS
,GROUPING(sex)
AS
FROM
employee_VIEW
GROUP BY CUBE((d1, dept, sex))
ORDER BY d1
,dept
,sex;
sal
#r
f1
fd
fs
ANSWER
==============================
D1 DEPT SEX SAL
#R F1 FD FS
-----------------------------A A00 F
52750 1 0 0 0
A A00 M
75750 2 0 0 0
B B01 M
41250 1 0 0 0
C C01 F
90470 3 0 0 0
D D11 F
73430 3 0 0 0
D D11 M
148670 6 0 0 0
- 482320 16 1 1 1
GROUP BY A
,B
,C
UNION ALL
GROUP BY()
Many of the more complicated SQL statements illustrated above are essentially unreadable
because it is very hard to tell what combinations of fields are being rolled up, and what are
not. There ought to be a more user-friendly way and, fortunately, there is. The CUBE command can be used to roll up everything. Then one can use ordinary SQL predicates to select
only those totals and sub-totals that one wants to display.
NOTE: Queries with multiple complicated ROLLUP and/or GROUPING SET statements
sometimes fail to compile. In which case, this method can be used to get the answer.
To illustrate this technique, consider the following query. It summarizes the data in the sample view by three fields:
SELECT
d1
,dept
,sex
,INT(SUM(salary))
,SMALLINT(COUNT(*))
FROM
employee_VIEW
GROUP BY d1
,dept
,sex
ORDER BY 1,2,3;
AS
AS
AS
AS
AS
d1
dpt
sx
sal
r
ANSWER
==================
D1 DPT SX
SAL R
-- --- -- ------ A A00 F
52750 1
A A00 M
75750 2
B B01 M
41250 1
C C01 F
90470 3
D D11 F
73430 3
D D11 M 148670 6
EQUIVILENT TO
=====================================
GROUP BY GROUPING SETS ((d1,dept,sex)
,(d1,dept)
,(d1,sex)
,(d1)
,(sex)
EQUIVILENT TO
,())
=======================
GROUP BY ROLLUP(d1,dept)
,ROLLUP(sex)
202
Rather than use either of the syntaxes shown on the right above, below we use the CUBE expression to get all sub-totals, and then select those that we want:
SELECT
FROM
WHERE
OR
OR
OR
OR
OR
ORDER
*
(SELECT
d1
AS d1
,dept
AS dpt
,sex
AS sx
,INT(SUM(salary))
AS sal
,SMALLINT(COUNT(*))
AS #r
,SMALLINT(GROUPING(d1))
AS g1
,SMALLINT(GROUPING(dept)) AS gd
,SMALLINT(GROUPING(sex))
AS gs
FROM
EMPLOYEE_VIEW
ANSWER
GROUP BY CUBE(d1,dept,sex)
============================
)AS xxx
D1 DPT SX SAL
#R G1 GD GS
(g1,gd,gs) = (0,0,0)
-- --- -- ------ -- -- -- -(g1,gd,gs) = (0,0,1)
A A00 F
52750 1 0 0 0
(g1,gd,gs) = (0,1,0)
A A00 M
75750 2 0 0 0
(g1,gd,gs) = (0,1,1)
A A00 - 128500 3 0 0 1
(g1,gd,gs) = (1,1,0)
A F
52750 1 0 1 0
(g1,gd,gs) = (1,1,1)
A M
75750 2 0 1 0
BY 1,2,3;
A - 128500 3 0 1 1
B B01 M
41250 1 0 0 0
B B01 41250 1 0 0 1
B M
41250 1 0 1 0
B 41250 1 0 1 1
C C01 F
90470 3 0 0 0
C C01 90470 3 0 0 1
C F
90470 3 0 1 0
C 90470 3 0 1 1
D D11 F
73430 3 0 0 0
D D11 M 148670 6 0 0 0
D D11 - 222100 9 0 0 1
D F
73430 3 0 1 0
D M 148670 6 0 1 0
D - 222100 9 0 1 1
- F 216650 7 1 1 0
- M 265670 9 1 1 0
- - 482320 16 1 1 1
=
=
=
=
=
=
(0,0,0)
(0,0,1)
(0,1,0)
(0,1,1)
(1,1,0)
(1,1,1)
<==
<==
<==
<==
<==
<==
203
Graeme Birchall
SELECT
d1
,dept
,sex
,INT(SUM(salary))
AS sal
,SMALLINT(COUNT(*)) AS #r
FROM
employee_view
GROUP BY ROLLUP(d1,dept)
,ROLLUP(sex)
ORDER BY 1,2,3;
ANSWER
=====================
D1 DEPT SEX
SAL #R
-- ---- --- ------ -A A00 F
52750 1
A A00 M
75750 2
A A00 128500 3
A F
52750 1
A M
75750 2
A 128500 3
B B01 M
41250 1
B B01 41250 1
B M
41250 1
B 41250 1
C C01 F
90470 3
C C01 90470 3
C F
90470 3
C 90470 3
D D11 F
73430 3
D D11 M
148670 6
D D11 222100 9
D F
73430 3
D M
148670 6
D 222100 9
- F
216650 7
- M
265670 9
- 482320 16
One should never assume that the result of a GROUP BY will be a set of appropriately ordered rows because DB2 may choose to use a "strange" index for the grouping so as to avoid
doing a row sort. For example, if one says "GROUP BY C1, C2" and the only suitable index
is on C2 descending and then C1, the data will probably come back in index-key order.
SELECT
dept, job
,COUNT(*)
FROM
staff
GROUP BY dept, job
ORDER BY dept, job;
Group By in Join
We want to select those rows in the STAFF table where the average SALARY for the employees DEPT is greater than $18,000. Answering this question requires using a JOIN and
GROUP BY in the same statement. The GROUP BY will have to be done first, then its result
will be joined to the STAFF table.
There are two syntactically different, but technically similar, ways to write this query. Both
techniques use a temporary table, but the way by which this is expressed differs. In the first
example, we shall use a common table expression:
204
ANSWER
=================
ID NAME
DEPT
--- -------- ---160 Molinare
10
210 Lu
10
240 Daniels
10
260 Jones
10
Figure 558, GROUP BY on one side of join - using common table expression
In the next example, we shall use a full-select:
SELECT
FROM
a.id
,a.name
,a.dept
staff a
,(SELECT
dept
AS dept
,AVG(salary) AS avgsal
FROM
staff
GROUP BY dept
HAVING
AVG(salary) > 18000
)AS b
WHERE
a.dept = b.dept
ORDER BY a.id;
ANSWER
=================
ID NAME
DEPT
--- -------- ---160 Molinare
10
210 Lu
10
240 Daniels
10
260 Jones
10
When there are no matching rows, the value returned by the COUNT depends upon whether
this is a GROUP BY in the SQL statement or not:
SELECT
FROM
WHERE
COUNT(*)
staff
id < 1;
AS c1
ANSWER
======
0
SELECT
FROM
WHERE
GROUP BY
COUNT(*)
staff
id < 1
id;
AS c1
ANSWER
======
no row
205
Graeme Birchall
206
Joins
A join is used to relate sets of rows in two or more logical tables. The tables are always joined
on a row-by-row basis using whatever join criteria are provided in the query. The result of a
join is always a new, albeit possibly empty, set of rows.
In a join, the matching rows are joined side-by-side to make the result table. By contrast, in a
union (see page 243) the matching rows are joined (in a sense) one-above-the-other to make
the result table.
Why Joins Matter
The most important data in a relational database is not that stored in the individual rows.
Rather, it is the implied relationships between sets of related rows. For example, individual
rows in an EMPLOYEE table may contain the employee ID and salary - both of which are
very important data items. However, it is the set of all rows in the same table that gives the
gross wages for the whole company, and it is the (implied) relationship between the EMPLOYEE and DEPARTMENT tables that enables one to get a breakdown of employees by
department and/or division.
Joins are important because one uses them to tease the relationships out of the database. They
are also important because they are very easy to get wrong.
Sample Views
CREATE
SELECT
FROM
WHERE
VIEW staff_v1 AS
id, name
staff
ID BETWEEN 10 AND 30;
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
Both views contain rows that have no corresponding ID in the other view.
Join Syntax
DB2 UDB SQL comes with two quite different ways to represent a join. Both syntax styles
will be shown throughout this section though, in truth, one of the styles is usually the better,
depending upon the situation.
The first style, which is only really suitable for inner joins, involves listing the tables to be
joined in a FROM statement. A comma separates each table name. A subsequent WHERE
statement constrains the join.
Joins
207
Graeme Birchall
,
table name
correlation name
WHERE join and other predicates
v1.id
,v1.name
,v2.job
FROM
staff_v1 v1
,staff_v2 v2
WHERE
v1.id = v2.id
ORDER BY v1.id
,v2.job;
JOIN ANSWER
=================
ID NAME
JOB
-- -------- ----20 Pernal
Sales
30 Marenghi Clerk
30 Marenghi Mgr
v1.id
,v2.job
,v3.name
staff_v1 v1
,staff_v2 v2
,staff_v1 v3
v1.id = v2.id
v2.id = v3.id
v3.name LIKE M%
BY v1.name
,v2.job;
JOIN ANSWER
=================
ID JOB
NAME
-- ----- -------30 Clerk Marenghi
30 Mgr
Marenghi
table name
c. name
LEFT
RIGHT
OUTER
FULL
JOIN
table name
ON
join predicates
WHERE join & other predicates
v1.id
,v1.name
,v2.job
FROM
staff_v1 v1
INNER JOIN
staff_v2 v2
ON
v1.id = v2.id
ORDER BY v1.id
,v2.job;
JOIN ANSWER
=================
ID NAME
JOB
-- -------- ----20 Pernal
Sales
30 Marenghi Clerk
30 Marenghi Mgr
208
Join Syntax
SELECT
FROM
JOIN
ON
JOIN
v1.id
,v2.job
,v3.name
staff_v1 v1
staff_v2 v2
v1.id = v2.id
staff_v1 v3
ON
v2.id = v3.id
WHERE
v3.name LIKE M%
ORDER BY v1.name
,v2.job;
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
JOIN ANSWER
=================
ID JOB
NAME
-- ----- -------30 Clerk Marenghi
30 Mgr
Marenghi
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
A join written using the second syntax style shown above can have either, or both, ON and
WHERE checks. These two types of check work quite differently:
WHERE checks are used to filter rows, and to define the nature of the join. Only those
rows that match all WHERE checks are returned.
ON checks define the nature of the join. They are used to categorize rows as either joined
or not-joined, rather than to exclude rows from the answer-set, though they may do this in
some situations.
Let illustrate this difference with a simple, if slightly silly, left outer join:
SELECT
*
FROM
staff_v1 v1
LEFT OUTER JOIN
staff_v2 v2
ON
1
= 1
AND
v1.id
= v2.id
ORDER BY v1.id
,v2.job;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
In an inner join, an ON check can exclude rows because it is used to define the nature of
the join and, by definition, in an inner join only matching rows are returned.
Joins
209
Graeme Birchall
In a partial outer join, an ON check on the originating table does not exclude rows. It
simply categorizes each row as participating in the join or not.
In a partial outer join, an ON check on the table to be joined to can exclude rows because
if the row fails the test, it does not match the join.
In a full outer join, an ON check never excludes rows. It simply categorizes them as
matching the join or not.
Each of the above principles will be demonstrated as we look at the different types of join.
Join Types
A generic join matches one row with another to create a new compound row. Joins can be
categorized by the nature of the match between the joined rows. In this section we shall discuss each join type and how to code it in SQL.
Inner Join
An inner-join is another name for a standard join in which two sets of columns are joined by
matching those rows that have equal data values. Most of the joins that one writes will probably be of this kind and, assuming that suitable indexes have been created, they will almost
always be very efficient.
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
Join on ID
==========>
INNER-JOIN ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
*
staff_v1 v1
,staff_v2 v2
WHERE
v1.id = v2.id
ORDER BY v1.id
,v2.job;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
In an inner join only, an ON and a WHERE check work much the same way. Both define the
nature of the join, and because in an inner join, only matching rows are returned, both act to
exclude all rows that do not match the join.
210
Join Types
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
A left outer join is the same as saying that I want all of the rows in the first table listed, plus
any matching rows in the second table:
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
=========>
LEFT-OUTER-JOIN ANSWER
======================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
v1.*
,v2.*
staff_v1 v1
,staff_v2 v2
v1.id = v2.id
v1.*
,CAST(NULL AS SMALLINT) AS id
,CAST(NULL AS CHAR(5)) AS job
FROM
staff_v1 v1
WHERE
v1.id NOT IN
(SELECT id FROM staff_v2)
ORDER BY 1,4;
Joins
211
Graeme Birchall
In any type of join, a WHERE check works as if the join is an inner join. If no row matches,
then no row is returned, regardless of what table the predicate refers to. By contrast, in a left
or right outer join, an ON check works differently, depending on what table field it refers to:
If it refers to a field in the table being joined to, it determines whether the related row
matches the join or not.
If it refers to a field in the table being joined from, it determines whether the related row
finds a match or not. Regardless, the row will be returned.
In the next example, those rows in the table being joined to (i.e. the V2 view) that match on
ID, and that are not for a manager are joined to:
SELECT
*
FROM
staff_v1 v1
LEFT OUTER JOIN
staff_v2 v2
ON
v1.id
= v2.id
AND
v2.job <> Mgr
ORDER BY v1.id
,v2.job;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi - -
212
Join Types
If we rewrite the above query using a WHERE check (on NAME) we will lose a row because
now the check excludes rows from the answer-set, rather than from participating in the join:
SELECT
*
FROM
staff_v1 v1
LEFT OUTER JOIN
staff_v2 v2
ON
v1.id
= v2.id
WHERE
v1.name > N
ORDER BY v1.id
,v2.job;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
A right outer join is the inverse of a left outer join. One gets every row in the second table
listed, plus any matching rows in the first table:
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
=========>
RIGHT-OUTER-JOIN ANSWER
=======================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
v1.*
,v2.*
staff_v1 v1
,staff_v2 v2
v1.id = v2.id
CAST(NULL AS SMALLINT)
AS id
,CAST(NULL AS VARCHAR(9)) AS name
,v2.*
FROM
staff_v2 v2
WHERE
v2.id NOT IN
(SELECT id FROM staff_v1)
ORDER BY 3,4;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
Joins
213
Graeme Birchall
The rules for ON and WHERE usage are the same in a right outer join as they are for a left
outer join (see page 212), except that the relevant tables are reversed.
Full Outer Joins
A full outer join occurs when all of the matching rows in two tables are joined, and there is
also returned one copy of each non-matching row in both tables.
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
=========>
FULL-OUTER-JOIN ANSWER
======================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
v1.*
,v2.*
staff_v1 v1
,staff_v2 v2
v1.id = v2.id
v1.*
,CAST(NULL AS SMALLINT) AS id
,CAST(NULL AS CHAR(5)) AS job
staff_v1 v1
v1.id NOT IN
(SELECT id FROM staff_v2)
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
UNION
SELECT
CAST(NULL AS SMALLINT)
AS id
,CAST(NULL AS VARCHAR(9)) AS name
,v2.*
FROM
staff_v2 v2
WHERE
v2.id NOT IN
(SELECT id FROM staff_v1)
ORDER BY 1,3,4;
In a full outer join, an ON check is quite unlike a WHERE check in that it never results in a
row being excluded from the answer set. All it does is categorize the input row as being either
214
Join Types
matching or non-matching. For example, in the following full outer join, the ON check joins
those rows with equal key values:
SELECT
*
FROM
staff_v1 v1
FULL OUTER JOIN
staff_v2 v2
ON
v1.id = v2.id
ORDER BY v1.id
,v2.id
,v2.job;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders
- 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders
- 20 Pernal
- 30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 20 Sales
- 40 Sales
- 50 Mgr
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders
- 20 Pernal
- 30 Marenghi - - 20 Sales
- 30 Clerk
- 30 Mgr
- 40 Sales
- 50 Mgr
Figure 591, Full Outer Join, match on keys (no rows match)
ON checks are somewhat like WHERE checks in that they have two purposes. Within a table,
they are used to categorize rows as being either matching or non-matching. Between tables,
they are used to define the fields that are to be joined on.
In the prior example, the first ON check defined the fields to join on, while the second join
identified those fields that matched the join. Because nothing matched (due to the second
predicate), everything fell into the "outer join" category. This means that we can remove the
first ON check without altering the answer set:
Joins
215
Graeme Birchall
SELECT
*
FROM
staff_v1 v1
FULL OUTER JOIN
staff_v2 v2
ON
+1 = -1
ORDER BY v1.id
,v2.id
,v2.job;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders
- 20 Pernal
- 30 Marenghi - - 20 Sales
- 30 Clerk
- 30 Mgr
- 40 Sales
- 50 Mgr
Figure 592, Full Outer Join, dont match on keys (no rows match)
What happens if everything matches and we dont identify the join fields? The result in a Cartesian Product:
SELECT
*
FROM
staff_v1 v1
FULL OUTER JOIN
staff_v2 v2
ON
+1 <> -1
ORDER BY v1.id
,v2.id
,v2.job;
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders 20 Sales
10 Sanders 30 Clerk
10 Sanders 30 Mgr
10 Sanders 40 Sales
10 Sanders 50 Mgr
20 Pernal
20 Sales
20 Pernal
30 Clerk
20 Pernal
30 Mgr
20 Pernal
40 Sales
20 Pernal
50 Mgr
30 Marenghi 20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
30 Marenghi 40 Sales
30 Marenghi 50 Mgr
Figure 593, Full Outer Join, dont match on keys (all rows match)
In an outer join, WHERE predicates behave as if they were written for an inner join. In particular, they always do the following:
WHERE predicates defining join fields enforce an inner join on those fields.
WHERE predicates on non-join fields are applied after the join, which means that when
they are used on not-null fields, they negate the outer join.
Here is an example of a WHERE join predicate turning an outer join into an inner join:
SELECT
*
FROM
staff_v1 v1
FULL JOIN
staff_v2 v2
ON
v1.id = v2.id
WHERE
v1.id = v2.id
ORDER BY 1,3,4;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
Figure 594, Full Outer Join, turned into an inner join by WHERE
To illustrate some of the complications that WHERE checks can cause, imagine that we want
to do a FULL OUTER JOIN on our two test views (see below), limiting the answer to those
rows where the "V1 ID" field is less than 30. There are several ways to express this query,
each giving a different answer:
216
Join Types
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
OUTER-JOIN CRITERIA
==================>
V1.ID = V2.ID
V1.ID < 30
ANSWER
============
???, DEPENDS
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
Figure 596, Outer join V1.ID < 30, check applied in WHERE (after join)
In the next example the "V1.ID < 30" check is done during the outer join where it does not
any eliminate rows, but rather limits those that match in the two views:
SELECT
*
FROM
staff_v1 v1
FULL JOIN
staff_v2 v2
ON
v1.id = v2.id
AND
v1.id < 30
ORDER BY 1,3,4;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi - - 30 Clerk
- 30 Mgr
- 40 Sales
- 50 Mgr
Figure 597, Outer join V1.ID < 30, check applied in ON (during join)
Imagine that what really wanted to have the "V1.ID < 30" check to only apply to those rows
in the "V1" table. Then one has to apply the check before the join, which requires the use of a
nested-table expression:
SELECT
FROM
*
(SELECT *
FROM
staff_v1
WHERE id < 30) AS v1
FULL OUTER JOIN
staff_v2 v2
ON
v1.id = v2.id
ORDER BY 1,3,4;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
- 30 Clerk
- 30 Mgr
- 40 Sales
- 50 Mgr
Figure 598, Outer join V1.ID < 30, check applied in WHERE (before join)
Observe how in the above query we still got a row back with an ID of 30, but it came from
the "V2" table. This makes sense, because the WHERE condition had been applied before we
got to this table.
There are several incorrect ways to answer the above question. In the first example, we shall
keep all non-matching V2 rows by allowing to pass any null V1.ID values:
Joins
217
Graeme Birchall
SELECT
*
FROM
staff_v1
FULL OUTER JOIN
staff_v2
ON
v1.id =
WHERE
v1.id <
OR
v1.id IS
ORDER BY 1,3,4;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
- 40 Sales
- 50 Mgr
v1
v2
v2.id
30
NULL
Figure 599, Outer join V1.ID < 30, (gives wrong answer - see text)
There are two problems with the above query: First, it is only appropriate to use when the
V1.ID field is defined as not null, which it is in this case. Second, we lost the row in the V2
table where the ID equaled 30. We can fix this latter problem, by adding another check, but
the answer is still wrong:
SELECT
*
FROM
staff_v1
FULL OUTER JOIN
staff_v2
ON
v1.id =
WHERE
v1.id <
OR
v1.id =
OR
v1.id IS
ORDER BY 1,3,4;
ANSWER
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders - 20 Pernal
20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
- 40 Sales
- 50 Mgr
v1
v2
v2.id
30
v2.id
NULL
Figure 600, Outer join V1.ID < 30, (gives wrong answer - see text)
The last two checks in the above query ensure that every V2 row is returned. But they also
have the affect of returning the NAME field from the V1 table whenever there is a match.
Given our intentions, this should not happen.
SUMMARY: Query WHERE conditions are applied after the join. When used in an outer
join, this means that they applied to all rows from all tables. In effect, this means that any
WHERE conditions in a full outer join will, in most cases, turn it into a form of inner join.
Cartesian Product
A Cartesian Product is a form of inner join, where the join predicates either do not exist, or
where they do a poor job of matching the keys in the joined tables.
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
=========>
CARTESIAN-PRODUCT
====================
ID NAME
ID JOB
-- -------- -- ----10 Sanders 20 Sales
10 Sanders 30 Clerk
10 Sanders 30 Mgr
10 Sanders 40 Sales
10 Sanders 50 Mgr
20 Pernal
20 Sales
20 Pernal
30 Clerk
20 Pernal
30 Mgr
20 Pernal
40 Sales
20 Pernal
50 Mgr
30 Marenghi 20 Sales
30 Marenghi 30 Clerk
30 Marenghi 30 Mgr
30 Marenghi 40 Sales
30 Marenghi 50 Mgr
218
Join Types
SELECT
FROM
*
staff_v1 v1
,staff_v2 v2
ORDER BY v1.id
,v2.id
,v2.job;
v2a.id
,v2a.job
,v2b.id
FROM
staff_v2 v2a
,staff_v2 v2b
WHERE
v2a.job = v2b.job
AND
v2a.id < 40
ORDER BY v2a.id
,v2b.id;
ANSWER
===========
ID JOB
ID
-- ----- -20 Sales 20
20 Sales 40
30 Clerk 30
30 Mgr
30
30 Mgr
50
v2.job
,COUNT(*) AS #rows
FROM
staff_v1 v1
,staff_v2 v2
GROUP BY v2.job
ORDER BY #rows
,v2.job;
ANSWER
===========
JOB
#ROWS
----- ----Clerk
3
Mgr
6
Sales
6
Joins
219
Graeme Birchall
Join Notes
Using the COALESCE Function
If you dont like working with nulls, but you need to do outer joins, then life is tough. In an
outer join, fields in non-matching rows are given null values as placeholders. Fortunately,
these nulls can be eliminated using the COALESCE function.
The COALESCE function can be used to combine multiple fields into one, and/or to eliminate null values where they occur. The result of the COALESCE is always the first non-null
value encountered. In the following example, the two ID fields are combined, and any null
NAME values are replaced with a question mark.
SELECT
COALESCE(v1.id,v2.id) AS id
,COALESCE(v1.name,?) AS name
,v2.job
FROM
staff_v1 v1
FULL OUTER JOIN
staff_v2 v2
ON
v1.id = v2.id
ORDER BY v1.id
,v2.job;
ANSWER
=================
ID NAME
JOB
-- -------- ----10 Sanders 20 Pernal
Sales
30 Marenghi Clerk
30 Marenghi Mgr
40 ?
Sales
50 ?
Mgr
Imagine that we wanted to do an outer join on our two test views, only getting those rows that
do not match. This is a surprisingly hard query to write.
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
NON-MATCHING
OUTER-JOIN
===========>
ANSWER
===================
ID NAME
ID JOB
-- ------- -- ----10 Sanders - - 40 Sales
- 50 Mgr
Figure 607, Example of outer join, only getting the non-matching rows
One way to express the above is to use the standard inner-join syntax:
SELECT
FROM
WHERE
v1.*
,CAST(NULL AS SMALLINT) AS id
,CAST(NULL AS CHAR(5)) AS job
staff_v1 v1
v1.id NOT IN
(SELECT id FROM staff_v2)
UNION
SELECT
CAST(NULL AS SMALLINT)
AS id
,CAST(NULL AS VARCHAR(9)) AS name
,v2.*
FROM
staff_v2 v2
WHERE
v2.id NOT IN
(SELECT id FROM staff_v1)
ORDER BY 1,3,4;
220
Join Notes
The above question can also be expressed using the outer-join syntax, but it requires the use
of two nested-table expressions. These are used to assign a label field to each table. Only
those rows where either of the two labels are null are returned:
SELECT
*
FROM
(SELECT v1.*
,V1 AS flag
FULL OUTER JOIN
(SELECT v2.*
,V2 AS flag
ON
v1.id = v2.id
WHERE
v1.flag IS NULL
OR
v2.flag IS NULL
ORDER BY v1.id
,v2.id
,v2.job;
v1
v2
v2.id
NULL
NULL
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
Imagine that we want to get selected rows from the V1 view, and for each matching row, get
the corresponding JOB from the V2 view - if there is one:
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
ANSWER
===================
ID NAME
ID JOB
-- ------- -- ----10 Sanders - 20 Pernal 20 Sales
Joins
221
Graeme Birchall
v1.id
,v1.name
,v2.job
FROM
staff_v1
LEFT OUTER JOIN
staff_v2
ON
v1.id =
WHERE
v1.id <>
ORDER BY v1.id ;
v1
v2
v2.id
30
ANSWER
=================
ID NAME
JOB
-- -------- ----10 Sanders 20 Pernal
Sales
v1.id
,v1.name
,(SELECT
FROM
WHERE
FROM
staff_v1
WHERE
v1.id <>
ORDER BY v1.id;
v2.job
staff_v2 v2
v1.id = v2.id) AS jb
v1
30
ANSWER
=================
ID NAME
JB
-- -------- ----10 Sanders 20 Pernal
Sales
The nested table expression in the SELECT is applied after all other joins and sub-queries
(i.e. in the FROM section of the query) are done.
Only one column and row (at most) can be returned by the expression.
Given the above restrictions, the following query will fail because more than one V2 row is
returned for every V1 row (for ID = 30):
SELECT
v1.id
,v1.name
,(SELECT
FROM
WHERE
FROM
staff_v1
ORDER BY v1.id;
v2.job
staff_v2 v2
v1.id = v2.id) AS jb
v1
ANSWER
=================
ID NAME
JB
-- -------- ----10 Sanders 20 Pernal
Sales
<error>
Figure 615, Outer Join done in SELECT phrase of SQL - gets error
To make the above query work for all IDs, we have to decide which of the two matching JOB
values for ID 30 we want. Let us assume that we want the maximum:
SELECT
v1.id
,v1.name
,(SELECT
FROM
WHERE
FROM
staff_v1
ORDER BY v1.id;
MAX(v2.job)
staff_v2 v2
v1.id = v2.id) AS jb
v1
ANSWER
=================
ID NAME
JB
-- -------- ----10 Sanders 20 Pernal
Sales
30 Marenghi Mgr
222
Join Notes
SELECT
v1.id
,v1.name
,MAX(v2.job) AS jb
FROM
staff_v1 v1
LEFT OUTER JOIN
staff_v2 v2
ON
v1.id = v2.id
GROUP BY v1.id
,v1.name
ORDER BY v1.id ;
ANSWER
=================
ID NAME
JB
-- -------- ----10 Sanders 20 Pernal
Sales
30 Marenghi Mgr
The SELECT expression can be placed in a CASE statement if needed. To illustrate, in the
following query we get the JOB from the V2 view, except when the person is a manager, in
which case we get the NAME from the corresponding row in the V1 view:
SELECT
v2.id
,CASE
WHEN v2.job <> Mgr
THEN v2.job
ELSE (SELECT v1.name
FROM
staff_v1 v1
WHERE v1.id = v2.id)
END AS j2
FROM
staff_v2 v2
ORDER BY v2.id
,j2;
ANSWER
===========
ID J2
-- -------20 Sales
30 Clerk
30 Marenghi
40 Sales
50 -
If you want to retrieve two columns using this type of join, you need to have two independent
nested table expressions:
SELECT
v2.id
,v2.job
,(SELECT
FROM
WHERE
,(SELECT
FROM
WHERE
FROM
staff_v2
ORDER BY v2.id
,v2.job;
v1.name
staff_v1 v1
v2.id = v1.id)
LENGTH(v1.name) AS n2
staff_v1 v1
v2.id = v1.id)
v2
ANSWER
====================
ID JOB
NAME
N2
-- ----- -------- -20 Sales Pernal
6
30 Clerk Marenghi 8
30 Mgr
Marenghi 8
40 Sales 50 Mgr
-
Joins
223
Graeme Birchall
SELECT
v2.id
,v2.job
,v1.name
,LENGTH(v1.name) AS n2
FROM
staff_v2 v2
LEFT OUTER JOIN
staff_v1 v1
ON
v2.id = v1.id
ORDER BY v2.id
,v2.job;
ANSWER
====================
ID JOB
NAME
N2
-- ----- -------- -20 Sales Pernal
6
30 Clerk Marenghi 8
30 Mgr
Marenghi 8
40 Sales 50 Mgr
-
This join style lets one easily mix and match individual rows with the results of column functions. For example, the following query returns a running SUM of the ID column:
SELECT
v1.id
,v1.name
,(SELECT SUM(x1.id)
FROM
staff_v1 x1
WHERE
x1.id <= v1.id
)AS sum_id
FROM
staff_v1 v1
ORDER BY v1.id
,v2.job;
ANSWER
==================
ID NAME
SUM_ID
-- -------- -----10 Sanders
10
20 Pernal
30
30 Marenghi
60
v1.id
,v1.name
,SUM(id) OVER(ORDER BY id) AS sum_id
FROM
staff_v1 v1
ORDER BY v1.id;
ANSWER
==================
ID NAME
SUM_ID
-- -------- -----10 Sanders
10
20 Pernal
30
30 Marenghi
60
Imagine that one wants to get all of the rows in STAFF_V1, and to also join those matching
rows in STAFF_V2 where the JOB begins with an S:
STAFF_V1
+-----------+
|ID|NAME
|
|--|--------|
|10|Sanders |
|20|Pernal |
|30|Marenghi|
+-----------+
STAFF_V2
+---------+
|ID|JOB
|
|--|------|
|20|Sales |
|30|Clerk |
|30|Mgr
|
|40|Sales |
|50|Mgr
|
+---------+
OUTER-JOIN CRITERIA
==================>
V1.ID
= V2.ID
V2.JOB LIKE S%
ANSWER
=================
ID NAME
JOB
-- -------- ----10 Sanders 20 Pernal
Sales
30 Marenghi -
224
Join Notes
SELECT
v1.id
,v1.name
,v2.job
FROM
staff_v1 v1
LEFT OUTER JOIN
staff_v2 v2
ON
v1.id
= v2.id
WHERE
v2.job LIKE S%
ORDER BY v1.id
,v2.job;
ANSWER (WRONG)
=================
ID NAME
JOB
-- -------- ----20 Pernal
Sales
v1.id
,v1.name
,v2.job
FROM
staff_v1 v1
LEFT OUTER JOIN
(SELECT *
FROM
staff_v2
WHERE
job LIKE S%
)AS v2
ON
v1.id = v2.id
ORDER BY v1.id
,v2.job;
ANSWER
=================
ID NAME
JOB
-- -------- ----10 Sanders 20 Pernal
Sales
30 Marenghi -
v1.id
,v1.name
,(SELECT v2.job
FROM
staff_v2 v2
WHERE v1.id
= v2.id
AND v2.job LIKE S%)
FROM
staff_v1 v1
ORDER BY v1.id
,job;
ANSWER
=================
ID NAME
JOB
-- -------- ----10 Sanders 20 Pernal
Sales
30 Marenghi -
You get nulls in an outer join, whether you want them or not, because the fields in nonmatching rows are set to null. If they bug you, use the COALESCE function to remove
them. See page 220 for an example.
From a logical perspective, all WHERE conditions are applied after the join. For performance reasons, DB2 may apply some checks before the join, especially in an inner
join, where doing this cannot affect the result set.
All WHERE conditions that join tables act as if they are doing an inner join, even when
they are written in an outer join.
The ON checks in a full outer join never remove rows. They simply determine what rows
are matching versus not (see page 214). To eliminate rows in an outer join, one must use
a WHERE condition.
The ON checks in a partial outer join work differently, depending on whether they are
against fields in the table being joined to, or joined from (see page 212).
Joins
225
Graeme Birchall
A Cartesian Product is not an outer join. It is a poorly matching inner join. By contrast, a
true outer join gets both matching rows, and non-matching rows.
The NODENUMBER and PARTITION functions cannot be used in an outer join. These
functions only work on rows in real tables.
When the join is defined in the SELECT part of the query (see page 221), it is done after any
other joins and/or sub-queries specified in the FROM phrase. And it acts as if it is a left outer
join.
Complex Joins
When one joins multiple tables using an outer join, one must consider carefully what exactly
what one wants to do, because the answer that one gets will depend upon how one writes the
query. To illustrate, the following query first gets a set of rows from the employee table, and
then joins (from the employee table) to both the activity and photo tables:
SELECT
eee.empno
,aaa.projno
,aaa.actno
,ppp.photo_format
FROM
employee
eee
LEFT OUTER JOIN
emp_act
aaa
ON
eee.empno
AND
aaa.emptime
AND
aaa.projno
LEFT OUTER JOIN
emp_photo ppp
ON
eee.empno
AND
ppp.photo_format
WHERE
eee.lastname
AND
eee.empno
AND
eee.empno
ORDER BY eee.empno;
AS format
= aaa.empno
= 1
LIKE M%1%
=
LIKE
LIKE
<
<>
ANSWER
==========================
EMPNO PROJNO ACTNO FORMAT
------ ------ ----- -----000010 MA2110
10 000070 - 000130 - bitmap
000150 MA2112
60 bitmap
000150 MA2112
180 bitmap
000160 MA2113
60 -
ppp.empno
b%
%A%
000170
000030
eee.empno
,aaa.projno
,aaa.actno
,ppp.photo_format
FROM
employee
eee
LEFT OUTER JOIN
emp_act
aaa
ON
eee.empno
AND
aaa.emptime
AND
aaa.projno
LEFT OUTER JOIN
emp_photo ppp
ON
aaa.empno
AND
ppp.photo_format
WHERE
eee.lastname
AND
eee.empno
AND
eee.empno
ORDER BY eee.empno;
AS format
= aaa.empno
= 1
LIKE M%1%
=
LIKE
LIKE
<
<>
ANSWER
==========================
EMPNO PROJNO ACTNO FORMAT
------ ------ ----- -----000010 MA2110
10 000070 - 000130 - 000150 MA2112
60 bitmap
000150 MA2112
180 bitmap
000160 MA2113
60 -
ppp.empno
b%
%A%
000170
000030
Figure 628, Join from Employee to Activity, then from Activity to Photo
The only difference between the above two queries is the first line of the second ON.
226
Join Notes
Mixing and matching inner and outer joins in the same query can cause one to get the wrong
answer. To illustrate, the next query has an inner join, followed by an outer join, followed by
an inner join. We are trying to do the following:
For each matching department, get the related employees. If no employees exist, do not
list the department (i.e. inner join).
For each employee found, list their matching activities, if any (i.e. left outer join).
For each activity found, only list it if its project-name contains the letter "Q" (i.e. inner
join between activity and project).
Below is the wrong way to write this query. It is wrong because the final inner join (between
activity and project) turns the preceding outer join into an inner join. This causes an employee
to not show when there are no matching projects:
SELECT
ddd.deptno AS dp#
,eee.empno
,aaa.projno
,ppp.projname
FROM
(SELECT *
FROM
department
WHERE
deptname
LIKE %A%
AND
deptname NOT LIKE %U%
AND
deptno
< E
)AS ddd
INNER JOIN
employee
eee
ON
ddd.deptno
= eee.workdept
AND
eee.lastname LIKE %A%
LEFT OUTER JOIN
emp_act
aaa
ON
aaa.empno
= eee.empno
AND
aaa.emptime
<= 0.5
INNER JOIN
project
ppp
ON
aaa.projno
= ppp.projno
AND
ppp.projname LIKE %Q%
ORDER BY ddd.deptno
,eee.empno
ANSWER
,aaa.projno;
================================
DP# EMPNO PROJNO PROJNAME
--- ------ ------ -------------C01 000030 IF1000 QUERY SERVICES
C01 000130 IF1000 QUERY SERVICES
Joins
227
Graeme Birchall
SELECT
ddd.deptno AS dp#
,eee.empno
,xxx.projno
,xxx.projname
FROM
(SELECT *
FROM
department
WHERE
deptname
LIKE %A%
AND
deptname NOT LIKE %U%
AND
deptno
< E
)AS ddd
INNER JOIN
employee
eee
ON
ddd.deptno
= eee.workdept
AND
eee.lastname LIKE %A%
LEFT OUTER JOIN
(SELECT
aaa.empno
,aaa.emptime
,aaa.projno
,ppp.projname
FROM
emp_act
aaa
INNER JOIN
project
ppp
ON
aaa.projno
= ppp.projno
AND
ppp.projname LIKE %Q%
)AS xxx
ON
xxx.empno
= eee.empno
AND
xxx.emptime
<= 0.5
ORDER BY ddd.deptno
,eee.empno
ANSWER
,xxx.projno;
================================
DP# EMPNO PROJNO PROJNAME
--- ------ ------ -------------C01 000030 IF1000 QUERY SERVICES
C01 000130 IF1000 QUERY SERVICES
D21 000070 D21 000240 -
228
Join Notes
Sub-Query
Sub-queries are hard to use, tricky to tune, and often do some strange things. Consequently, a
lot of people try to avoid them, but this is stupid because sub-queries are really, really, useful.
Using a relational database and not writing sub-queries is almost as bad as not doing joins.
A sub-query is a special type of full-select that is used to relate one table to another without
actually doing a join. For example, it lets one select all of the rows in one table where some
related value exists, or does not exist, in another table.
Sample Tables
Two tables will be used in this section. Please note that the second sample table has a mixture
of null and not-null values:
CREATE TABLE table1
(t1a
CHAR(1)
,t1b
CHAR(2)
,PRIMARY KEY(t1a));
COMMIT;
NOT NULL
NOT NULL
NOT NULL
NOT NULL
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
Sub-query Flavours
Sub-query Syntax
( subselect )
SOME
ANY
ALL
NOT
EXISTS
IN
Sub-Query
229
Graeme Birchall
No Keyword Sub-Query
One does not have to provide a SOME, or ANY, or IN, or any other keyword, when writing a
sub-query. But if one does not, there are three possible results:
If more than one row in the sub-query result matches, you get a SQL error.
In the example below, the T1A field in TABLE1 is checked to see if it equals the result of the
sub-query (against T2A in TABLE2). For the value "A" there is a match, while for the values
"B" and "C" there is no match:
SELECT *
FROM
table1
WHERE t1a =
(SELECT t2a
FROM
table2
WHERE t2a = A);
ANSWER
=======
T1A T1B
--- -A
AA
SUB-Q
RESLT
+---+
|T2A|
|---|
|A |
+---+
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
=======
<error>
SUB-Q
RESLT
+---+
|T2A|
|---|
|A |
|B |
+---+
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
When a SOME or ANY sub-query check is used, there are two possible results:
If no value found in the sub-query result matches, the answer is also false.
230
Sub-query Flavours
The query below compares the current T1A value against the sub-query result three times.
The first row (i.e. T1A = "A") fails the test, while the next two rows pass:
SELECT *
FROM
table1
WHERE t1a > ANY
(SELECT t2a
FROM
table2);
ANSWER
=======
T1A T1B
--- -B
BB
C
CC
SUB-Q
RESLT
+---+
|T2A|
|---|
|A |
|B |
+---+
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
> ALL(sub-query)
< ALL(sub-query)
When an ALL sub-query check is used, there are two possible results:
If there are no rows in the sub-query result, the answer is also true.
If any row in the sub-query result does not match, or is null, the answer is false.
Below is a typical example of the ALL check usage. Observe that a TABLE1 row is returned
only if the current T1A value equals all of the rows in the sub-query result:
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2b
FROM
table2
WHERE t2b >= A);
ANSWER
=======
T1A T1B
--- -A
AA
SUB-Q
RESLT
+---+
|T2B|
|---|
|A |
|A |
+---+
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
C
CC
SUB-Q
RESLT
+---+
|T2B|
|---|
+---+
Sub-Query
231
Graeme Birchall
Imagine that one wanted to get a row from TABLE1 where the T1A value matched all of the
sub-query result rows, but if the latter was an empty set (i.e. no rows), one wanted to get a
non-match. Try this:
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2b
FROM
table2
WHERE t2b >= X)
AND 0 <>
(SELECT COUNT(*)
FROM
table2
WHERE t2b >= X);
ANSWER
======
0 rows
SQ-#1
RESLT
+---+
|T2B|
|---|
+---+
SQ-#2
RESLT
+---+
|(*)|
|---|
|0 |
+---+
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
Figure 639, ALL sub-query, with extra check for empty set
Two sub-queries are done above: The first looks to see if all matching values in the sub-query
equal the current T1A value. The second confirms that the number of matching values in the
sub-query is not zero.
WARNING: Observe that the ANY sub-query check returns false when used against an
empty set, while a similar ALL check returns true.
EXISTS Keyword Sub-Query
So far, we have been taking a value from the TABLE1 table and comparing it against one or
more rows in the TABLE2 table. The EXISTS phrase does not compare values against rows,
rather it simply looks for the existence or non-existence of rows in the sub-query result set:
Below is an EXISTS check that, given our sample data, always returns true:
SELECT *
FROM
table1
WHERE EXISTS
(SELECT *
FROM
table2);
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
C
CC
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
======
0 rows
232
Sub-query Flavours
SELECT *
FROM
table1
WHERE EXISTS
(SELECT COUNT(*)
FROM
table2
WHERE t2b = X);
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
C
CC
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
The NOT EXISTS phrases looks for the non-existence of rows in the sub-query result set:
We can use a NOT EXISTS check to create something similar to an ALL check, but with one
very important difference. The two checks will handle nulls differently. To illustrate, consider
the following two queries, both of which will return a row from TABLE1 only when it equals
all of the matching rows in TABLE2:
SELECT *
FROM
table1
WHERE NOT EXISTS
(SELECT *
FROM
table2
WHERE t2c >= A
AND t2c <> t1a);
ANSWERS
=======
T1A T1B
--- --A
AA
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2c
FROM
table2
WHERE t2c >= A);
Figure 643, NOT EXISTS vs. ALL, ignore nulls, find match
The above two queries are very similar. Both define a set of rows in TABLE2 where the T2C
value is greater than or equal to "A", and then both look for matching TABLE2 rows that are
not equal to the current T1A value. If a row is found, the sub-query is false.
What happens when no TABLE2 rows match the ">=" predicate? As is shown below, both of
our test queries treat an empty set as a match:
SELECT *
FROM
table1
WHERE NOT EXISTS
(SELECT *
FROM
table2
WHERE t2c >= X
AND t2c <> t1a);
ANSWERS
=======
T1A T1B
--- --A
AA
B
BB
C
CC
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2c
FROM
table2
WHERE t2c >= X);
Sub-Query
233
Graeme Birchall
One might think that the above two queries are logically equivalent, but they are not. As is
shown below, they return different results when the sub-query answer set can include nulls:
SELECT *
FROM
table1
WHERE NOT EXISTS
(SELECT *
FROM
table2
WHERE t2c <> t1a);
ANSWER
=======
T1A T1B
--- --A
AA
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2c
FROM
table2);
ANSWER
=======
no rows
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
In the ALL sub-query, each value in T1A is checked against all of the values in T2C. The
null value is checked, deemed to differ, and so the sub-query always returns false.
In the NOT EXISTS sub-query, each value in T1A is used to find those T2C values that
are not equal. For the T1A values "B" and "C", the T2C value "A" does not equal, so the
NOT EXISTS check will fail. But for the T1A value "A", there are no "not equal" values
in T2C, because a null value does not "not equal" a literal. So the NOT EXISTS check
will pass.
The following three queries list those T2C values that do "not equal" a given T1A value:
SELECT *
FROM
table2
WHERE t2c <> A;
SELECT *
FROM
table2
WHERE t2c <> B;
SELECT *
FROM
table2
WHERE t2c <> C;
ANSWER
===========
T2A T2B T2C
--- --- --no rows
ANSWER
===========
T2A T2B T2C
--- --- --A
A
A
ANSWER
===========
T2A T2B T2C
--- --- --A
A
A
ANSWER
=======
no rows
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
234
Sub-query Flavours
IN Keyword Sub-Query
If all of the values in the sub-query result are null, the answer is false.
Below is an example that compares the T1A and T2A columns. Two rows match:
SELECT *
FROM
table1
WHERE t1a IN
(SELECT t2a
FROM
table2);
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
======
0 rows
ANSWERS
===========
T2A T2B T2C
--- --- --A
A
A
SELECT *
FROM
table2
WHERE t2c = ANY
(SELECT t2c
FROM
table2);
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
Sub-queries that look for the non-existence of a row work largely as one would expect, except
when a null value in involved. To illustrate, consider the following query, where we want to
see if the current T1A value is not in the set of T2C values:
SELECT *
FROM
table1
WHERE t1a NOT IN
(SELECT t2c
FROM
table2);
ANSWER
======
0 rows
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
Sub-Query
235
Graeme Birchall
Observe that the T1A values "B" and "C" are obviously not in T2C, yet they are not returned.
The sub-query result set contains the value null, which causes the NOT IN check to return
unknown, which equates to false.
The next example removes the null values from the sub-query result, which then enables the
NOT IN check to find the non-matching values:
SELECT *
FROM
table1
WHERE t1a NOT IN
(SELECT t2c
FROM
table2
WHERE t2c IS NOT NULL);
ANSWER
=======
T1A T1B
--- -B
BB
C
CC
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
=======
T1A T1B
--- -B
BB
C
CC
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
An uncorrelated sub-query is one where the predicates in the sub-query part of SQL statement
have no direct relationship to the current row being processed in the "top" table (hence uncorrelated). The following sub-query is uncorrelated:
SELECT *
FROM
table1
WHERE t1a IN
(SELECT t2a
FROM
table2);
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
236
Sub-query Flavours
SELECT *
FROM
table2
WHERE EXISTS
(SELECT
FROM
WHERE
ANSWER
===========
T2A T2B T2C
--- --- --A
A
A
aa
*
table2 bb
aa.t2a = bb.t2b);
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
In general, if there is a suitable index on the sub-query table, use a correlated sub-query. Else,
use an uncorrelated sub-query. However, there are several very important exceptions to this
rule, and some queries can only be written one way.
NOTE: The DB2 optimizer is not as good at choosing the best access path for sub-queries
as it is with joins. Be prepared to spend some time doing tuning.
Multi-Field Sub-Queries
Imagine that you want to compare multiple items in your sub-query. The following examples
use an IN expression and a correlated EXISTS sub-query to do two equality checks:
SELECT *
FROM
table1
WHERE (t1a,t1b) IN
(SELECT t2a, t2b
FROM
table2);
SELECT *
FROM
table1
WHERE EXISTS
(SELECT
FROM
WHERE
AND
ANSWER
======
0 rows
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
======
0 rows
*
table2
t1a = t2a
t1b = t2b);
*
table2
t1a = t2a
t1b >= t2b);
ANSWER
=======
T1A T1B
--- -A
AA
B
BB
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
Some business questions may require that the related SQL statement be written as a series of
nested sub-queries. In the following example, we are after all employees in the EMPLOYEE
table who have a salary that is greater than the maximum salary of all those other employees
that do not work on a project with a name beginning MA.
Sub-Query
237
Graeme Birchall
SELECT empno
,lastname
,salary
FROM
employee
WHERE salary >
(SELECT MAX(salary)
FROM
employee
WHERE empno NOT IN
(SELECT empno
FROM
emp_act
WHERE projno LIKE MA%))
ORDER BY 1;
ANSWER
=========================
EMPNO LASTNAME SALARY
------ --------- -------000010 HAAS
52750.00
000110 LUCCHESSI 46500.00
Usage Examples
In this section we will use various sub-queries to compare our two test tables - looking for
those rows where none, any, ten, or all values match.
Beware of Nulls
The presence of null values greatly complicates sub-query usage. Not allowing for them when
they are present can cause one to get what is arguably a wrong answer. And do not assume
that just because you dont have any nullable fields that you will never therefore encounter a
null value. The DEPTNO table in the Department table is defined as not null, but in the following query, the maximum DEPTNO that is returned will be null:
SELECT
COUNT(*)
AS #rows
,MAX(deptno) AS maxdpt
FROM
department
WHERE
deptname LIKE Z%
ORDER BY 1;
ANSWER
=============
#ROWS MAXDEPT
----- ------0
null
Find all rows in TABLE1 where there are no rows in TABLE2 that have a T2C value equal to
the current T1A value in the TABLE1 table:
SELECT *
FROM
table1
WHERE 0 =
(SELECT
FROM
WHERE
t1
COUNT(*)
table2 t2
t1.t1a = t2.t2c);
SELECT *
FROM
table1 t1
WHERE NOT EXISTS
(SELECT *
FROM
table2 t2
WHERE t1.t1a = t2.t2c);
SELECT *
FROM
table1
WHERE t1a NOT IN
(SELECT t2c
FROM
table2
WHERE t2c IS NOT NULL);
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
=======
T1A T1B
--- --B
BB
C
CC
238
Usage Examples
Observe that in the last statement above we eliminated the null rows from the sub-query. Had
this not been done, the NOT IN check would have found them and then returned a result of
"unknown" (i.e. false) for all of rows in the TABLE1A table.
Using a Join
Another way to answer the same problem is to use a left outer join, going from TABLE1 to
TABLE2 while matching on the T1A and T2C fields. Get only those rows (from TABLE1)
where the corresponding T2C value is null:
SELECT t1.*
FROM
table1 t1
LEFT OUTER JOIN
table2 t2
ON
t1.t1a = t2.t2c
WHERE t2.t2c IS NULL;
ANSWER
=======
T1A T1B
--- --B
BB
C
CC
Find all rows in TABLE1 where there are one, or more, rows in TABLE2 that have a T2C
value equal to the current T1A value:
SELECT *
FROM
table1
WHERE EXISTS
(SELECT
FROM
WHERE
SELECT *
FROM
table1
WHERE 1 <=
(SELECT
FROM
WHERE
t1
*
table2 t2
t1.t1a = t2.t2c);
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
t1
COUNT(*)
table2 t2
t1.t1a = t2.t2c);
SELECT *
FROM
table1
WHERE t1a = ANY
(SELECT t2c
FROM
table2);
ANSWER
=======
T1A T1B
--- --A
AA
SELECT *
FROM
table1
WHERE t1a = SOME
(SELECT t2c
FROM
table2);
SELECT *
FROM
table1
WHERE t1a IN
(SELECT t2c
FROM
table2);
Sub-Query
239
Graeme Birchall
Using a Join
This question can also be answered using an inner join. The trick is to make a list of distinct
T2C values, and then join that list to TABLE1 using the T1A column. Several variations on
this theme are given below:
WITH t2 AS
(SELECT DISTINCT t2c
FROM
table2
)
SELECT t1.*
FROM
table1 t1
,t2
WHERE t1.t1a = t2.t2c;
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
SELECT t1.*
FROM
table1 t1
,(SELECT DISTINCT t2c
FROM
table2
)AS t2
WHERE
t1.t1a = t2.t2c;
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
=======
T1A T1B
--- --A
AA
SELECT t1.*
FROM
table1 t1
INNER JOIN
(SELECT
DISTINCT t2c
FROM
table2
)AS t2
ON
t1.t1a = t2.t2c;
Find all rows in TABLE1 where there are exactly ten rows in TABLE2 that have a T2B value
equal to the current T1A value in the TABLE1 table:
SELECT *
FROM
table1 t1
WHERE 10 =
(SELECT
COUNT(*)
FROM
table2 t2
WHERE
t1.t1a = t2.t2b);
SELECT *
FROM
table1
WHERE EXISTS
(SELECT
FROM
WHERE
GROUP BY
HAVING
t2b
table2
t1a = t2b
t2b
COUNT(*) = 10);
SELECT *
FROM
table1
WHERE t1a IN
(SELECT
FROM
GROUP BY
HAVING
t2b
table2
t2b
COUNT(*) = 10);
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
======
0 rows
240
Usage Examples
SELECT *
FROM
table1
WHERE (t1a,10) IN
(SELECT
t2b, COUNT(*)
FROM
table2
GROUP BY t2b);
ANSWER
======
0 rows
To answer this generic question using a join, one simply builds a distinct list of T2B values
that have ten rows, and then joins the result to TABLE1:
WITH t2 AS
(SELECT
t2b
FROM
table2
GROUP BY t2b
HAVING
COUNT(*) = 10
)
SELECT t1.*
FROM
table1 t1
,t2
WHERE t1.t1a = t2.t2b;
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
======
0 rows
SELECT t1.*
FROM
table1 t1
,(SELECT
FROM
GROUP BY
HAVING
)AS t2
WHERE
t1.t1a =
t2b
table2
t2b
COUNT(*) = 10
SELECT t1.*
FROM
table1 t1
INNER JOIN
(SELECT
FROM
GROUP BY
HAVING
)AS t2
ON
t1.t1a =
t2b
table2
t2b
COUNT(*) = 10
t2.t2b;
t2.t2b;
Find all rows in TABLE1 where all matching rows in TABLE2 have a T2B value equal to the
current T1A value in the TABLE1 table. Before we show some SQL, we need to decide what
to do about nulls and empty sets:
When nulls are found in the sub-query, we can either deem that their presence makes the
relationship false, which is what DB2 does, or we can exclude nulls from our analysis.
When there are no rows found in the sub-query, we can either say that the relationship is
false, or we can do as DB2 does, and say that the relationship is true.
Sub-Query
241
Graeme Birchall
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2b
FROM
table2);
SELECT *
FROM
table1
WHERE NOT EXISTS
(SELECT *
FROM
table2
WHERE t1a <> t2b);
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
=======
T1A T1B
--- --A
AA
ANSWER
=======
T1A T1B
--- --A
AA
B
BB
C
CC
SELECT *
FROM
table1
WHERE NOT EXISTS
(SELECT *
FROM
table2
WHERE t1a <> t2b
AND t2b >= X);
The next two queries differ from the above in how they address empty sets. The queries will
return a row from TABLE1 if the current T1A value matches all of the T2B values found in
the sub-query, but they will not return a row if no matching values are found:
SELECT *
FROM
table1
WHERE t1a = ALL
(SELECT t2b
FROM
table2
WHERE t2b >= X)
AND 0 <>
(SELECT COUNT(*)
FROM
table2
WHERE t2b >= X);
SELECT *
FROM
table1
WHERE t1a IN
(SELECT
FROM
WHERE
HAVING
TABLE1
+-------+
|T1A|T1B|
|---|---|
|A |AA |
|B |BB |
|C |CC |
+-------+
TABLE2
+-----------+
|T2A|T2B|T2C|
|---|---|---|
|A |A |A |
|B |A | - |
+-----------+
"-" = null
ANSWER
======
0 rows
MAX(t2b)
table2
t2b >= X
COUNT(DISTINCT t2b) = 1);
Figure 670, Sub-queries, true if all match, and at least one value found
Both of the above statements have flaws: The first processes the TABLE2 table twice, which
not only involves double work, but also requires that the sub-query predicates be duplicated.
The second statement is just plain strange.
242
Usage Examples
R2
-A
A
B
B
B
C
D
----A
B
C
D
E
R1
UNION
ALL
R2
----A
A
A
A
A
B
B
B
B
B
C
C
C
C
D
E
R1
INTERSECT
R2
--------A
B
C
R1
INTERSECT
ALL
R2
----A
A
B
B
C
R1
EXCEPT
R2
-----E
R1
EXCEPT
ALL
R2
-----A
C
C
E
Syntax Diagram
SELECT statement
UNION
SELECT statement
VALUES statement
UNION ALL
VALUES statement
EXCEPT
EXCEPT ALL
INTERSECT
INTERSECT ALL
R1 (R1)
(A),(A),(A),(B),(B),(C),(C),(C),(E);
R2 (R2)
(A),(A),(B),(B),(B),(C),(D);
ANSWER
======
SELECT
R1
R1 R2
FROM
R1
-- -ORDER BY R1;
A
A
A
A
SELECT
R2
A
B
FROM
R2
B
B
ORDER BY R2;
B
B
C
C
C
D
C
E
243
Graeme Birchall
Usage Notes
Union & Union All
A UNION operation combines two sets of columns and removes duplicates. The UNION
ALL expression does the same but does not remove the duplicates.
SELECT
FROM
UNION
SELECT
FROM
ORDER BY
R1
R1
R1
-A
A
A
B
B
C
C
C
E
R2
R2
1;
SELECT
R1
FROM
R1
UNION ALL
SELECT
R2
FROM
R2
ORDER BY 1;
R2
-A
A
B
B
B
C
D
UNION
=====
A
B
C
D
E
UNION ALL
=========
A
A
A
A
A
B
B
B
B
B
C
C
C
C
D
E
An INTERSECT operation retrieves the matching set of distinct values (not rows) from two
columns. The INTERSECT ALL returns the set of matching individual rows.
SELECT
R1
FROM
R1
INTERSECT
SELECT
R2
FROM
R2
ORDER BY 1;
SELECT
R1
FROM
R1
INTERSECT ALL
SELECT
R2
FROM
R2
ORDER BY 1;
R1
-A
A
A
B
B
C
C
C
E
R2
-A
A
B
B
B
C
D
INTERSECT
=========
A
B
C
INTERSECT ALL
=============
A
A
B
B
C
An EXCEPT operation retrieves the set of distinct data values (not rows) that exist in the first
the table but not in the second. The EXCEPT ALL returns the set of individual rows that exist
only in the first table.
244
Usage Notes
SELECT
FROM
EXCEPT
SELECT
FROM
ORDER BY
R1
R1
R1
-A
A
A
B
B
C
C
C
E
R2
R2
1;
SELECT
R1
FROM
R1
EXCEPT ALL
SELECT
R2
FROM
R2
ORDER BY 1;
R2
-A
A
B
B
B
C
D
R1
EXCEPT
R2
=====
E
R1
EXCEPT ALL
R2
==========
A
C
C
E
R2
R2
R1
-A
A
A
B
B
C
C
C
E
R1
R1
1;
SELECT
R2
FROM
R2
EXCEPT ALL
SELECT
R1
FROM
R1
ORDER BY 1;
R2
-A
A
B
B
B
C
D
R2
EXCEPT
R1
=====
D
R2
EXCEPT ALL
R1
==========
B
D
Precedence Rules
When multiple operations are done in the same SQL statement, there are precedence rules:
The next example illustrates how parenthesis can be used change the processing order:
SELECT
FROM
UNION
SELECT
FROM
EXCEPT
SELECT
FROM
ORDER BY
ANSWER
======
E
R1
R1
R2
R2
R2
R2
1;
(SELECT
FROM
UNION
SELECT
FROM
)EXCEPT
SELECT
FROM
ORDER BY
R1
R1
R2
R2
R2
R2
1;
ANSWER
======
E
SELECT
FROM
UNION
(SELECT
FROM
EXCEPT
SELECT
FROM
)ORDER BY
R1
R1
R2
R2
R2
R2
1;
R1
-A
A
A
B
B
C
C
C
E
R2
-A
A
B
B
B
C
D
ANSWER
======
A
B
C
E
245
Graeme Birchall
Imagine that one has a series of tables that track sales data, with one table for each year. One
can define a view that is the UNION ALL of these tables, so that a user would see them as a
single object. Such a view can support inserts, updates, and deletes, as long as each table in
the view has a constraint that distinguishes it from all the others. Below is an example:
CREATE TABLE sales_data_2002
(sales_date
DATE
NOT NULL
,daily_seq#
INTEGER
NOT NULL
,cust_id
INTEGER
NOT NULL
,amount
DEC(10,2)
NOT NULL
,invoice#
INTEGER
NOT NULL
,sales_rep
CHAR(10)
NOT NULL
,CONSTRAINT C CHECK (YEAR(sales_date) = 2002)
,PRIMARY KEY (sales_date, daily_seq#));
CREATE TABLE sales_data_2003
(sales_date
DATE
NOT NULL
,daily_seq#
INTEGER
NOT NULL
,cust_id
INTEGER
NOT NULL
,amount
DEC(10,2)
NOT NULL
,invoice#
INTEGER
NOT NULL
,sales_rep
CHAR(10)
NOT NULL
,CONSTRAINT C CHECK (YEAR(sales_date) = 2003)
,PRIMARY KEY (sales_date, daily_seq#));
CREATE VIEW sales_data AS
SELECT *
FROM
sales_data_2002
UNION ALL
SELECT *
FROM
sales_data_2003;
DAILY_SEQ#
---------1
1
2
CUST_ID
------123
123
123
AMOUNT
-----100.10
100.10
50.05
INVOICE#
-------998
996
997
SALES_REP
--------FRED
SUE
JOHN
246
Usage Notes
OPTIMIZED QUERY
=================================
SELECT Q1.dept AS "dept"
,Q1.sum_id / Q1.count_rows
FROM
staff_summary AS Q1
Usage Notes
A materialized query table is defined using a variation of the standard CREATE TABLE
statement. Instead of providing an element list, one supplies a SELECT statement, and defines the refresh option:
CREATE
TABLE
table-name
AS
SUMMARY
( select stmt )
DEFERRED
IMMEDIATE
MAINTAINED BY SYSTEM
MAINTAINED BY USER
247
Graeme Birchall
REFRESH DEFERRED: The data is refreshed whenever one does a REFRESH TABLE.
At this point, DB2 will first delete all of the existing rows in the table, then run the select
statement defined in the CREATE to (you guessed it) repopulate.
REFRESH IMMEDIATE: Once created, this type of table has to be refreshed once using
the REFRESH statement. From then on, DB2 will maintain the materialized query table
in sync with the source table as changes are made to the latter.
Materialized query tables that are defined REFRESH IMMEDIATE are obviously the most
useful in that the data in them is always current. But they may cost quite a bit to maintain.
Query Optimization Options
ENABLE: The table is used for query optimization when appropriate. This is the default.
The table can also be queried directly.
DISABLE: The table will not be used for query optimization. It can be queried directly.
Maintain Options
SYSTEM: The data in the materialized query table is maintained by the system. This is
the default.
USER: The user is allowed to perform insert, update, and delete operations against the
materialized query table. The table cannot be refreshed. This type of table can be used
when you want to maintain your own materialized query table (e.g. using triggers) to
support features not provided by DB2. The table can also be defined to enable query optimization, but the optimizer will probably never use it as a substitute for a real table.
The following table compares materialized query table options to subsequent actions:
MATERIALIZED QUERY TABLE
==========================
REFRESH
MAINTAINED BY
=========
=============
DEFERRED
SYSTEM
USER
IMMEDIATE
SYSTEM
248
Usage Notes
Various restrictions apply to the select statement used to define the materialized query table:
Refresh Deferred Tables
Reference to a system catalogue table is not allowed. Reference to an explain table is allowed, but is impudent.
If the query references more than one table or view, it must define as inner join, yet not
use the INNER JOIN syntax (i.e. must use old style).
If there is a GROUP BY, the SELECT list must have a COUNT(*) or COUNT_BIG(*)
column.
Besides the COUNT and COUNT_BIG, the only other column functions supported are
SUM and GROUPING - all with the DISTINCT phrase. Any field that allows nulls, and
that is summed, but also have a COUNT(column name) function defined.
The table must have at least one unique index defined, and the SELECT list must include
(amongst other things) all the columns of this index.
Grouping sets, CUBE an ROLLUP are allowed. The GROUP BY items and associated
GROUPING column functions in the select list must for a unique key of the result set.
249
Graeme Birchall
A materialized query table defined REFRESH DEFERRED can be periodically updated using
the REFRESH TABLE command. Below is an example of a such a table that has one row per
qualifying department in the STAFF table:
CREATE TABLE staff_names AS
(SELECT
dept
,COUNT(*)
AS
,SUM(salary)
AS
,AVG(salary)
AS
,MAX(salary)
AS
,MIN(salary)
AS
,STDDEV(salary)
AS
,VARIANCE(salary) AS
,CURRENT TIMESTAMP AS
FROM
staff
WHERE
TRANSLATE(name) LIKE
AND
salary
>
GROUP BY dept
HAVING
COUNT(*) = 1
)DATA INITIALLY DEFERRED REFRESH
count_rows
sum_salary
avg_salary
max_salary
min_salary
std_salary
var_salary
last_change
%A%
10000
DEFERRED;
Unless told otherwise, the DB2 optimizer will not use a materialized query table that is defined refresh deferred, because it cannot guarantee that the data in the table is up to date. If it
is desired that such a table be referenced when appropriate, one has to set the REFRESH
AGE special register to a non-zero value:
=
SET CURRENT REFRESH AGE
number
ANY
host-var
AS age_ts
AS current_ts
250
Usage Notes
SELECT
CHAR(tabschema,10)
,CHAR(tabname,20)
,type
,refresh
,refresh_time
,card
,DATE(create_time)
,DATE(stats_time)
FROM
syscat.tables
WHERE
type = S
ORDER BY 1,2;
AS schema
AS table
AS #rows
AS create_dt
AS stats_dt
emp.workdept
,DEC(SUM(emp.salary),8,2)
,DEC(AVG(emp.salary),7,2)
,SMALLINT(COUNT(emp.comm))
,SMALLINT(COUNT(*))
FROM
employee emp
WHERE
emp.workdept
> C
GROUP BY emp.workdept
HAVING
COUNT(*)
<> 5
AND
SUM(emp.salary) > 50000
ORDER BY sum_sal DESC;
AS
AS
AS
AS
sum_sal
avg_sal
#comms
#emps
emp.workdept
,COUNT(*)
AS #rows
FROM
employee emp
WHERE
emp.workdept IN
(SELECT deptno
FROM
department
WHERE deptname LIKE %S%)
GROUP BY emp.workdept
HAVING
SUM(salary) > 50000;
251
Graeme Birchall
SELECT
#emps
,DEC(SUM(sum_sal),9,2)
AS sal_sal
,SMALLINT(COUNT(*))
AS #depts
FROM
(SELECT
emp.workdept
,DEC(SUM(emp.salary),8,2)
,MAX(emp.salary)
,SMALLINT(COUNT(*))
FROM
employee emp
GROUP BY emp.workdept
)AS XXX
GROUP BY #emps
HAVING
COUNT(*) > 1
ORDER BY #emps
FETCH FIRST 3 ROWS ONLY
OPTIMIZE FOR 3 ROWS;
AS sum_sal
AS max_sal
AS #emps
All of the above materialized query tables have contained a GROUP BY in their definition.
But this is not necessary. To illustrate, we will first create a simple table:
CREATE TABLE staff_all
(id
SMALLINT
,name
VARCHAR(9)
,job
CHAR(5)
,salary
DECIMAL(7,2)
,PRIMARY KEY(id));
NOT NULL
NOT NULL
Both tables have identical primary keys (i.e. same number of columns).
252
Usage Notes
Below is a query that can not use the EMP_SUMMARY table because of the reference to the
MAX function. Ironically, this query is exactly the same as the nested table expression above,
but in the prior example the MAX is ignored because it is never actually selected:
SELECT
emp.workdept
,DEC(SUM(emp.salary),8,2)
,MAX(emp.salary)
FROM
employee emp
GROUP BY emp.workdept;
AS sum_sal
AS max_sal
emp.workdept
,DEC(SUM(emp.salary),8,2)
,COUNT(DISTINCT salary)
FROM
employee emp
GROUP BY emp.workdept;
AS sum_sal
AS #salaries
A materialized query table must be refreshed before it can be queried. If the table is defined refresh immediate, then the table will be maintained automatically after the initial
refresh.
Make sure to commit after doing a refresh. The refresh does not have an implied commit.
253
Graeme Birchall
Single-table materialized query tables save having to look at individual rows to resolve a
GROUP BY. Multi-table materialized query tables do this, and also avoid having to resolve a
join.
CREATE TABLE dept_emp_summary AS
(SELECT
emp.workdept
,dpt.deptname
,COUNT(*)
AS num_rows
,COUNT(emp.salary) AS num_salary
,SUM(emp.salary)
AS sum_salary
,COUNT(emp.comm)
AS num_comm
,SUM(emp.comm)
AS sum_comm
FROM
employee
emp
,department dpt
WHERE
dpt.deptno = emp.workdept
GROUP BY emp.workdept
,dpt.deptname
)DATA INITIALLY DEFERRED REFRESH IMMEDIATE;
d.deptname
,d.deptno
,DEC(AVG(e.salary),7,2)
AS avg_sal
,SMALLINT(COUNT(*))
AS #emps
FROM
department d
,employee
e
WHERE
e.workdept
= d.deptno
AND
d.deptname LIKE %S%
GROUP BY d.deptname
,d.deptno
HAVING
SUM(e.comm)
> 4000
ORDER BY avg_sal DESC;
FROM
Q2.$C0
,Q2.$C1
,Q2.$C2
,Q2.$C3
(SELECT
AS
AS
AS
AS
"deptname"
"deptno"
"avg_sal"
"#emps"
Q1.deptname
,Q1.workdept
,DEC((Q1.sum_salary / Q1.num_salary),7,2)
,SMALLINT(Q1.num_rows)
dept_emp_summary AS Q1
(Q1.deptname LIKE %S%)
(4000 < Q1.sum_comm)
AS
AS
AS
AS
$C0
$C1
$C2
$C3
FROM
WHERE
AND
)AS Q2
ORDER BY Q2.$C2 DESC;
The join must be an inner join, and it must be written in the old style syntax.
Every table accessed in the join (except one?) must have a unique index.
The GROUP BY must include all of the fields that define the unique key for every table
(except one?) in the join.
254
Usage Notes
Three-table Example
d.deptno
,d.deptname
,DEC(AVG(a.emptime),5,2) AS avg_time
department d
,employee
e
,emp_act
a
d.deptno
= e.workdept
e.empno
= a.empno
d.deptname LIKE %S%
e.firstnme LIKE %S%
BY d.deptno
,d.deptname
BY 3 DESC;
WHERE
AND
AND
AND
GROUP
ORDER
Q4.$C0 AS "deptno"
,Q4.$C1 AS "deptname"
,Q4.$C2 AS "avg_time"
FROM
(SELECT
Q3.$C3
AS $C0
,Q3.$C2
AS $C1
,DEC((Q3.$C1 / Q3.$C0),5,2) AS $C2
FROM
(SELECT
SUM(Q2.$C2)
AS $C0
,SUM(Q2.$C3)
AS $C1
,Q2.$C0
AS $C2
,Q2.$C1
AS $C3
FROM
(SELECT
Q1.deptname
AS
,Q1.workdept
AS
,Q1.num_time
AS
,Q1.sum_time
AS
FROM
dpt_emp_act_sumry AS Q1
WHERE
(Q1.firstnme LIKE %S%)
AND
(Q1.DEPTNAME LIKE %S%)
)AS Q2
GROUP BY Q2.$C1
,Q2.$C0
)AS Q3
)AS Q4
ORDER BY Q4.$C2 DESC;
$C0
$C1
$C2
$C3
255
Graeme Birchall
To really make things fly, one can add indexes to the materialized query table columns. DB2
will then use these indexes to locate the required data. Certain restrictions apply:
The materialized query table must not be in a "check pending" status when the index is
defined. Run a refresh to address this problem.
Below are some indexes for the DPT_EMP_ACT_SUMRY table that was defined above:
CREATE INDEX dpt_emp_act_sumx1
ON dpt_emp_act_sumry
(workdept
,deptname
,empno
,firstnme);
CREATE INDEX dpt_emp_act_sumx2
ON dpt_emp_act_sumry
(num_rows);
FROM
d.deptno
,d.deptname
,e.empno
,e.firstnme
,INT(AVG(a.emptime)) AS avg_time
department d
,employee
e
,emp_act
a
d.deptno
= e.workdept
e.empno
= a.empno
d.deptno LIKE D%
BY d.deptno
,d.deptname
,e.empno
,e.firstnme
BY 1,2,3,4;
WHERE
AND
AND
GROUP
ORDER
d.deptno
,d.deptname
,e.empno
,e.firstnme
,COUNT(*)
FROM
department
,employee
,emp_act
WHERE
d.deptno
AND
e.empno
GROUP BY d.deptno
,d.deptname
,e.empno
,e.firstnme
HAVING
COUNT(*) >
ORDER BY 1,2,3,4;
AS #acts
d
e
a
= e.workdept
= a.empno
256
Usage Notes
Organizing by Dimensions
The following materialized query table is organized (clustered) by the two columns that are
referred to in the GROUP BY. Under the covers, DB2 will also create a dimension index on
each column, and a block index on both columns combined:
CREATE TABLE emp_sum AS
(SELECT
workdept
,job
,SUM(salary)
AS sum_sal
,COUNT(*)
AS #emps
,GROUPING(workdept) AS grp_dpt
,GROUPING(job)
AS grp_job
FROM
employee
GROUP BY CUBE(workdept
,job))
DATA INITIALLY DEFERRED REFRESH DEFERRED
ORGANIZE BY DIMENSIONS (workdept, job)
IN tsempsum;
A staging table can be used to incrementally maintain a materialized query table that has been
defined refresh deferred. Using a staging table can result in a significant performance saving
(during the refresh) if the source table is very large, and is not changed very often.
NOTE: To use a staging table, the SQL statement used to define the target materialized
query table must follow the rules that apply for a table that is defined refresh immediate even though it is defined refresh deferred.
A list of columns (with no attributes) in the target materialized query table. The column
names do not have to match those in the target table.
Either two or three additional columns with specific names- as provided by DB2.
dept
#rows
#sal
sum_sal
DEFERRED;
257
Graeme Birchall
Figure 715, Staging table for the above materialized query table
Additional Columns
The two, or three, additional columns that every staging table must have are as follows:
OPERATIONTYPE: The operation type (i.e. insert, update, or delete). This column is
needed if the target materialized query table does not contain a GROUP BY statement.
To activate the staging table one must first use the SET INTEGRITY command to remove the
check pending flag, and then do a full refresh of the target materialized query table. After this
is done, the staging table will record all changes to the source table.
Use the refresh incremental command to apply the changes recorded in the staging table to
the target materialized query table.
SET INTEGRITY FOR emp_sumry_s STAGING IMMEDIATE UNCHECKED;
REFRESH TABLE emp_sumry;
<< make changes to the source table (i.e. employee) >>
REFRESH TABLE emp_sumry INCREMENTAL;
258
Usage Notes
Use an identity column, which generates a unique value per row in a table.
Use a sequence, which generates a unique value per one or more tables.
You may need to know what values were generated during each insert. There are several
ways to do this:
For all of the above techniques, embed the insert inside a select statement (see figure 731
and/or page 54). This is probably the best solution.
The only way that one can be absolutely certain not to have a gap in the sequence of values
generated is to create your own using an insert trigger. However, this solution is probably the
least efficient of those listed here, and it certainly has the least concurrency.
There is almost never a valid business reason for requiring an unbroken sequence of values.
So the best thing to do, if your users ask for such a feature, is to beat them up.
Living With Sequence Errors
For efficiency reasons, identity column and sequence values are usually handed out (to users
doing inserts) in block of values, where the block size is defined using the CACHE option. If
a user inserts a row, and then dithers for a bit before inserting another, it is possible that some
other user (with a higher value) will insert first. In this case, the identity column or sequence
value will be a good approximation of the insert sequence, but not right on.
If the users need to know the precise order with which rows were inserted, then either set the
cache size to one, which will cost, or include a current timestamp value.
Identity Columns
One can define a column in a DB2 table as an "identity column". This column, which must be
numeric (note: fractional fields not allowed), will be incremented by a fixed constant each
time a new row is inserted. Below is a syntax diagram for that part of a CREATE TABLE
statement that refers to an identity column definition:
259
Graeme Birchall
column name
data type
GENERATED
ALWAYS
BY DEFAULT
AS IDENTITY
(
1
numeric constant
START WITH
1
numeric constant
INCREMENT BY
NO MINVALUE
MINVALUE
numeric constant
NO MAXVALUE
MAXVALUE
numeric constant
NO CYCLE
CYCLE
CACHE 20
NO CACHE
CACHE integer constant
NO ORDER
ORDER
NULL
NULL
NULL
NULL
NULL
NULL
The value is generated by DB2 only if the user does not provide a value (i.e. by default).
This configuration is typically used when the input is coming from an external source
(e.g. data propagation).
Rules
260
Identity Columns
The column type must be numeric and must not allow fractional values. Any integer type
is OK. Decimal is also fine, as long as the scale is zero. Floating point is a no-no.
The identity column value is generated before any BEFORE triggers are applied. Use a
trigger transition variable to see the value.
A unique index is not required on the identity column, but it is a good idea. Certainly, if
the value is being created by DB2, then a non-unique index is a fairly stupid idea.
Unlike triggers, identity column logic is invoked and used during a LOAD. However, a
load-replace will not reset the identity column value. Use the RESTART command (see
below) to do this. An identity column is not affected by a REORG.
Syntax Notes
START WITH defines the start value, which can be any valid integer value. If no start
value is provided, then the default is the MINVALUE for ascending sequences, and the
MAXVALUE for descending sequences. If this value is also not provided, then the default is 1.
INCREMENT BY defines the interval between consecutive values. This can be any valid
integer value, though using zero is pretty silly. The default is 1.
MINVALUE defines (for ascending sequences) the value that the sequence will start at if
no start value is provided. It is also the value that an ascending sequence will begin again
at after it reaches the maximum and loops around. If no minimum value is provided, then
after reaching the maximum the sequence will begin again at the start value. If that is also
not defined, then the sequence will begin again at 1, which is the default start value.
For descending sequences, it is the minimum value that will be used before the sequence
loops around, and starts again at the maximum value.
MAXVALUE defines (for ascending sequences) the value that a sequence will stop at,
and then go back to the minimum value. For descending sequences, it is the start value (if
no start value is provided), and also the restart value - if the sequence reaches the minimum and loops around.
CYCLE defines whether the sequence should cycle about when it reaches the maximum
value (for an ascending sequences), or whether it should stop. The default is no cycle.
CACHE defines whether or not to allocate sequences values in chunks, and thus to save
on log writes. The default is no cache, which means that every row inserted causes a log
write (to save the current value).
If a cache value (from 2 to 20) is provided, then the new values are assigned to a common
pool in blocks. Each insert user takes from the pool, and only when all of the values are
used is a new block (of values) allocated and a log write done. If the table is deactivated,
either normally or otherwise, then the values in the current block are discarded, resulting
in gaps in the sequence. Gaps in the sequence of values also occur when an insert is subsequently rolled back, so they cannot be avoided. But dont use the cache if you want to
try and avoid them.
ORDER defines whether all new rows inserted are assigned a sequence number in the
order that they were inserted. The default is no, which means that occasionally a row that
is inserted after another may get a slightly lower sequence number. This is the default.
261
Graeme Birchall
The following example uses all of the defaults to start an identity column at one, and then to
go up in increments of one. The inserts will eventually die when they reach the maximum
allowed value for the field type (i.e. for small integer = 32K).
CREATE TABLE test_data
(key# SMALLINT NOT NULL
GENERATED ALWAYS AS IDENTITY
,dat1 SMALLINT NOT NULL
,ts1
TIMESTAMP NOT NULL
,PRIMARY KEY(key#));
Figure 722, Identity column, odd values, then even, then stuck
Usage Examples
Below is the DDL for a simplified invoice table where the primary key is an identity column.
Observe that the invoice# is always generated by DB2:
262
Identity Columns
NULL
NULL
NULL
NULL
NULL
NULL
ANSWER
========
INVOICE#
-------101
102
SALE_DATE
---------2001-11-22
2002-11-22
2003-11-22
CUSTOMER_ID
----------ABC
DEF
GHI
PRODUCT_ID
--- -----123
123
123
QUANTITY
-------100
100
100
PRICE
----10.00
10.00
10.00
Imagine that the application is happily collecting invoices in the above table, but your silly
boss is unhappy because not enough invoices, as measured by the ever-ascending invoice#
value, are being generated per unit of time. We can improve things without actually fixing
any difficult business problems by simply altering the invoice# current value and the increment using the ALTER TABLE ... RESTART command:
ALTER TABLE invoice_data
ALTER COLUMN invoice#
RESTART WITH 1000
SET INCREMENT BY 2;
263
Graeme Birchall
INVOICE#
-------100
101
102
1000
1002
SALE_DATE
---------2001-11-22
2002-11-22
2003-11-22
2004-11-24
2004-11-25
CUSTOMER_ID
----------ABC
DEF
GHI
XXX
YYY
PRODUCT_ID
---------123
123
123
123
123
QUANTITY
-------100
100
100
100
100
PRICE
----10.00
10.00
10.00
10.00
10.00
The identity column options can be changed using the ALTER TABLE command:
RESTART
numeric constant
SET INCREMENT BY
numeric constant
SET
NO MINVALUE
MINVALUE
numeric constant
SET
NO MAXVALUE
MAXVALUE
numeric constant
SET
NO CYCLE
CYCLE
SET
NO ORDER
ORDER
If an identity column is generated always, and no cache is used, and the increment value is 1,
then there will usually be no gaps in the sequence of assigned values. But gaps can occur if an
insert is subsequently rolled out instead of committed. In the following example, there will be
no row in the table with customer number "1" after the rollback:
CREATE TABLE customers
(cust#
INTEGER
NOT NULL
GENERATED ALWAYS AS IDENTITY (NO CACHE)
,cname
CHAR(10)
NOT NULL
,ctype
CHAR(03)
NOT NULL
,PRIMARY KEY
(cust#));
COMMIT;
SELECT cust#
FROM
FINAL TABLE
(INSERT INTO customers
VALUES (DEFAULT,FRED,XXX));
ROLLBACK;
ANSWER
======
CUST#
----1
SELECT
FROM
(INSERT
VALUES
COMMIT;
ANSWER
======
CUST#
----2
cust#
FINAL TABLE
INTO customers
(DEFAULT,FRED,XXX));
264
Identity Columns
IDENTITY_VAL_LOCAL Function
There are two ways to find out what values were generated when one inserted a row into a
table with an identity column:
The function returns null if the user has not done a single-row insert in the current unit of
work. Therefore, the function has to be invoked before one does a commit. Having said
this, in some versions of DB2 it seems to work fine after a commit.
If the user inserts multiple rows into table(s) having identity columns in the same unit of
work, the result will be the value obtained from the last single-row insert. The result will
be null if there was none.
Multiple-row inserts are ignored by the function. So if the user first inserts one row, and
then separately inserts two rows (in a single SQL statement), the function will return the
identity column value generated during the first insert.
The function cannot be called in a trigger or SQL function. To get the current identity
column value in an insert trigger, use the trigger transition variable for the column. The
value, and thus the transition variable, is defined before the trigger is begun.
If invoked inside an insert statement (i.e. as an input value), the value will be taken from
the most recent (previous) single-row insert done in the same unit of work. The result will
be null if there was none.
The value returned by the function is unpredictable if the prior single-row insert failed. It
may be the value from the insert before, or it may be the value given to the failed insert.
The function is non-deterministic, which means that the result is determined at fetch time
(i.e. not at open) when used in a cursor. So if one fetches a row from a cursor, and then
does an insert, the next fetch may get a different value from the prior.
The value returned by the function may not equal the value in the table - if either a trigger
or an update has changed the field since the value was generated. This can only occur if
the identity column is defined as being "generated by default". An identity column that is
"generated always" cannot be updated.
When multiple users are inserting into the same table concurrently, each will see their
own most recent identity column value. They cannot see each others.
If the above sounds unduly complex, it is because it is. It is often much easier to simply get
the values by embedding the insert inside a select:
SELECT
MIN(cust#) AS minc
,MAX(cust#) AS maxc
,COUNT(*)
AS rows
FROM
FINAL TABLE
(INSERT INTO customers
VALUES (DEFAULT,FRED,xxx)
,(DEFAULT,DAVE,yyy)
,(DEFAULT,JOHN,zzz));
ANSWER
==============
MINC MAXC ROWS
---- ---- ---3
5
3
265
Graeme Birchall
Below are two examples of the function in use. Observe that the second invocation (done after the commit) returned a value, even though it is supposed to return null:
CREATE TABLE invoice_table
(invoice#
INTEGER
NOT
GENERATED ALWAYS AS IDENTITY
,sale_date
DATE
NOT
,customer_id
CHAR(20)
NOT
,product_id
INTEGER
NOT
,quantity
INTEGER
NOT
,price
DECIMAL(18,2)
NOT
,PRIMARY KEY
(invoice#));
COMMIT;
NULL
NULL
NULL
NULL
NULL
NULL
<<< ANSWER
======
ID
---1
COMMIT;
WITH temp (id) AS
(VALUES (IDENTITY_VAL_LOCAL()))
SELECT *
FROM
temp;
<<< ANSWER
======
ID
---1
invoice#
AS inv#
,sale_date
,IDENTITY_VAL_LOCAL() AS id
FROM
invoice_table
ORDER BY 1;
COMMIT;
ANSWER
==================
INV# SALE_DATE ID
---- ---------- -1 11/22/2000 2
2 11/23/2000 2
3 11/24/2000 2
4 11/25/2000 2
ANSWER
==================
INV# SALE_DATE ID
---- ---------- -2 11/23/2000 2
266
Identity Columns
Sequences
A sequence is almost the same as an identity column, except that it is an object that exists
outside of any particular table.
CREATE SEQUENCE fred
AS DECIMAL(31)
START WITH 100
INCREMENT BY 2
NO MINVALUE
NO MAXVALUE
NO CYCLE
CACHE 20
ORDER;
If the increment is zero, the sequence will stay whatever value one started it with until it is
altered. This can be useful if wants to have a constant that can be globally referenced:
CREATE SEQUENCE biggest_sale_to_date
AS INTEGER
START WITH 345678
INCREMENT BY 0;
There is no concept of a current sequence value. Instead one can either retrieve the next or the
previous value (if there is one). And any reference to the next value will invariably cause the
sequence to be incremented. The following example illustrates this:
CREATE SEQUENCE fred;
COMMIT;
WITH temp1 (n1) AS
(VALUES 1
UNION ALL
SELECT n1 + 1
FROM
temp1
WHERE n1 < 5
)
SELECT NEXTVAL FOR fred AS seq#
FROM
temp1;
ANSWER
======
SEQ#
---1
2
3
4
5
267
Graeme Birchall
One retrieves the next or previous value using a "NEXTVAL FOR sequence-name", or a
"PREVVAL for sequence-name" call.
A NEXTVAL call generates and returns the next value in the sequence. Thus, each call
will consume the returned value. This remains true even if the statement that did the retrieval subsequently fails or is rolled back.
A PREVVAL call returns the most recently generated value for the specified sequence
for the current connection. Unlike when getting the next value, getting the prior value
does not alter the state of the sequence, so multiple calls can retrieve the same value.
If no NEXTVAL reference (to the target sequence) has been made for the current connection, any attempt to get the PREVVAL will result in a SQL error.
SELECT INTO statement (within the select part), as long as there is no DISTINCT,
GROUP BY, UNION, EXECPT, or INTERSECT.
A trigger.
DELETE statement.
CASE expression
SELECT statement where there is an outer select that contains a DISTINCT, GROUP
BY, UNION, EXCEPT, or INTERSECT.
Most sub-queries.
A trigger.
There are many more usage restrictions, but you presumably get the picture. See the DB2
SQL Reference for the complete list.
Usage Examples
Below a sequence is defined, then various next and previous values are retrieved:
268
Sequences
ANSWERS
=======
===>
PRV
--<error>
===>
NXT
--1
===>
PRV
--1
===>
NXT PRV
--- --2
1
3
1
4
1
5
1
6
1
ANSWERS
=======
WITH temp1 AS
(SELECT
id
,NEXTVAL FOR fred AS nxt
FROM
staff
WHERE
id < 100
)
SELECT *
FROM
temp1
WHERE id = 50 + (nxt * 0);
===>
ID NXT
-- --50
5
===>
NXT PRV
--- --10
9
Multi-table Usage
Imagine that one wanted to maintain a unique sequence of values over multiple tables. One
can do this by creating a before insert trigger on each table that replaces whatever value the
user provides with the current one from a common sequence. Below is an example:
269
Graeme Birchall
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
cust#
,cname
FROM
FINAL TABLE
(INSERT INTO us_customer (cname, frst_sale, #sales)
VALUES (FRED,2002-10-22,1)
,(JOHN,2002-10-23,1));
cust#
,cname
FROM
FINAL TABLE
(INSERT INTO intl_customer (cname, frst_sale, #sales)
VALUES (SUE,2002-11-12,2)
,(DEB,2002-11-13,2));
ANSWERS
===========
CUST# CNAME
----- ----1 FRED
2 JOHN
SELECT
CUST#
----3
4
CNAME
----SUE
DEB
ANSWER
======
PREV
---4
270
Sequences
In the next example, two sequences are created: One records the number of rows deleted from
a table, while the other records the number of delete statements run against the same:
CREATE SEQUENCE delete_rows
START WITH
1
INCREMENT BY 1
NO MAXVALUE
NO CYCLE
ORDER;
CREATE SEQUENCE delete_stmts
START WITH
1
INCREMENT BY 1
NO MAXVALUE
NO CYCLE
ORDER;
CREATE TABLE customer
(cust#
INTEGER
,cname
CHAR(10)
,frst_sale
DATE
,#sales
INTEGER
,PRIMARY KEY
(cust#));
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
Only one identity column is allowed per table, whereas a single table can have multiple
sequences and/or multiple references to the same sequence.
Sequences require triggers to automatically maintain column values (e.g. during inserts)
in tables. Identity columns do not.
271
Graeme Birchall
Sequences can be incremented during inserts, updates, deletes (via triggers), or selects,
whereas identity columns only get incremented during inserts.
Sequences can be incremented (via triggers) once per row, or once per statement. Identity
columns are always updated per row inserted.
Sequences can be dropped and created independent of any tables that they might be used
to maintain values in. Identity columns are part of the table definition.
Identity columns are supported by the load utility. Trigger induced sequences are not.
For both types of sequence, one can get the current value by embedding the DML statement
inside a select (e.g. see figure 731). Alternatively, one can use the relevant expression to get
the current status. These differ as follows:
One cannot tell to which table an IDENTITY_VAL_LOCAL function result refers to.
This can be a problem in one insert invokes another insert (via a trigger), which puts are
row in another table with its own identity column. By contrast, in the PREVVAL function one explicitly identifies the sequence to be read.
NOT
NOT
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
NULL
NULL
272
The good news about the above setup is that it will never result in gaps in the sequence of
values. In particular, if a newly inserted row is rolled back after the insert is done, the next
insert will simply use the same invoice# value. But there is also bad news:
Only one user can insert at a time, because the select (in the trigger) needs to see the
highest invoice# in the table in order to complete.
Multiple rows cannot be inserted in a single SQL statement (i.e. a mass insert). The trigger is invoked before the rows are actually inserted, one row at a time, for all rows. Each
row would see the same, already existing, high invoice#, so the whole insert would die
due to a duplicate row violation.
There may be a tiny, tiny chance that if two users were to begin an insert at exactly the
same time that they would both see the same high invoice# (in the before trigger), and so
the last one to complete (i.e. to add a pointer to the unique invoice# index) would get a
duplicate-row violation.
Below are some inserts to the above table. Ignore the values provided in the first field - they
are replaced in the trigger. And observe that the third insert is rolled out:
INSERT INTO sales_invoice VALUES (0,2001-06-22,ABC,123,10,1);
INSERT INTO sales_invoice VALUES (0,2001-06-23,DEF,453,10,1);
COMMIT;
INSERT INTO sales_invoice VALUES (0,2001-06-24,XXX,888,10,1);
ROLLBACK;
INSERT INTO sales_invoice VALUES (0,2001-06-25,YYY,999,10,1);
COMMIT;
ANSWER
==============================================================
INVOICE# SALE_DATE
CUSTOMER_ID PRODUCT_ID QUANTITY PRICE
-------- ---------- ----------- ---------- -------- ----1 06/22/2001 ABC
123
10
1.00
2 06/23/2001 DEF
453
10
1.00
3 06/25/2001 YYY
999
10
1.00
The next design is more powerful in that it supports multi-row inserts, and also more than one
table if desired. It requires that there be a central location that holds the current high-value. In
the example below, this value will be in a row in a special control table. Every insert into the
related data table will, via triggers, first update, and then query, the row in the control table.
Control Table
The following table has one row per sequence of values being maintained:
CREATE TABLE control_table
(table_name
CHAR(18)
NOT NULL
,table_nmbr
INTEGER
NOT NULL
,PRIMARY KEY (table_name));
273
Graeme Birchall
Data Table
The INVOICE# column will be populated, using triggers, during the insert process with a
unique ascending value. However, for part of the time during the insert the field will have
a null value, which is why it is defined as being both non-unique and allowing nulls.
CREATE TABLE invoice_table
(unqval
CHAR(13) FOR BIT DATA
,invoice#
INTEGER
,sale_date
DATE
,customer_id
CHAR(20)
,product_id
INTEGER
,quantity
INTEGER
,price
DECIMAL(18,2)
,PRIMARY KEY(unqval));
NOT NULL
NOT
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
NULL
274
The single row in the control table is a point of contention, because only one user can
update it at a time. One must therefore commit often (perhaps more often than one would
like to) in order to free up the locks on this row. Therefore, by implication, this design
puts one is at the mercy of programmers.
The two extra updates add a considerable overhead to the cost of the insert.
The invoice number values generated by AFTER trigger cannot be obtained by selecting
from an insert statement (see page 54). In fact, selecting from the FINAL TABLE will result in a SQL error. One has to instead select from the NEW TABLE, which returns the
new rows before the AFTER trigger was applied.
As with ordinary sequences, this design enables one to have multiple tables referring to a single row in the control table, and thus using a common sequence.
275
Graeme Birchall
276
Temporary Tables
Introduction
How one defines a temporary table depends in part upon how often, and for how long, one
intends to use it:
If one intends to use a temporary table just once, it can be defined as a nested table expression. In the following example, we use a temporary table to sequence the matching rows in
the STAFF table by descending salary. We then select the 2nd through 3rd rows:
SELECT
FROM
id
,salary
(SELECT
s.*
,ROW_NUMBER() OVER(ORDER BY salary DESC) AS sorder
FROM
staff s
WHERE
id < 200
ANSWER
)AS xxx
=============
WHERE
sorder BETWEEN 2 AND 3
ID
SALARY
ORDER BY id;
--- -------50 20659.80
140 21150.00
Imagine that one wanted to get the percentage contribution of the salary in some set of rows
in the STAFF table - compared to the total salary for the same. The only way to do this is to
access the matching rows twice; Once to get the total salary (i.e. just one row), and then again
to join the total salary value to each individual salary - to work out the percentage.
Temporary Tables
277
Graeme Birchall
Selecting the same set of rows twice in a single query is generally unwise because repeating
the predicates increases the likelihood of typos being made. In the next example, the desired
rows are first placed in a temporary table. Then the sum salary is calculated and placed in
another temporary table. Finally, the two temporary tables are joined to get the percentage:
WITH
ANSWER
rows_wanted AS
================================
(SELECT *
ID NAME
SALARY
SUM_SAL PCT
FROM
staff
-- ------- -------- -------- --WHERE
id
< 100
70 Rothman 16502.83 34504.58 47
AND
UCASE(name) LIKE %T%
90 Koonitz 18001.75 34504.58 52
),
sum_salary AS
(SELECT SUM(salary) AS sum_sal
FROM
rows_wanted)
SELECT
id
,name
,salary
,sum_sal
,INT((salary * 100) / sum_sal) AS pct
FROM
rows_wanted
,sum_salary
ORDER BY id;
To refer to a temporary table in multiple SQL statements in the same thread, one has to define
a declared global temporary table. An example follows:
DECLARE GLOBAL TEMPORARY TABLE session.fred
(dept
SMALLINT
NOT NULL
,avg_salary
DEC(7,2)
NOT NULL
,num_emps
SMALLINT
NOT NULL)
ON COMMIT PRESERVE ROWS;
COMMIT;
INSERT INTO session.fred
SELECT
dept
,AVG(salary)
,COUNT(*)
FROM
staff
WHERE
id > 200
GROUP BY dept;
COMMIT;
SELECT
FROM
COUNT(*) AS cnt
session.fred;
*
session.fred;
ANSWER#1
========
CNT
--4
ANSWER#2
==========================
DEPT AVG_SALARY NUM_EMPS
---- ---------- -------10
20168.08
3
51
15161.43
3
66
17215.24
5
278
Introduction
Use a WITH phrase at the top of the query to define a common table expression.
The following three queries, which are logically equivalent, illustrate the above syntax styles.
Observe that the first two queries are explicitly defined as left outer joins, while the last one is
implicitly a left outer join:
WITH staff_dept AS
(SELECT
dept
AS dept#
,MAX(salary) AS max_sal
FROM
staff
WHERE
dept < 50
GROUP BY dept
)
SELECT
id
,dept
,salary
,max_sal
FROM
staff
LEFT OUTER JOIN
staff_dept
ON
dept
= dept#
WHERE
name LIKE S%
ORDER BY id;
ANSWER
==========================
ID DEPT SALARY
MAX_SAL
--- ---- -------- -------10
20 18357.50 18357.50
190
20 14252.75 18357.50
200
42 11508.60 18352.80
220
51 17654.50
-
id
,dept
,salary
,max_sal
FROM
staff
LEFT OUTER JOIN
(SELECT
dept
AS dept#
,MAX(salary) AS max_sal
FROM
staff
WHERE
dept < 50
GROUP BY dept
)AS STAFF_dept
ON
dept
= dept#
WHERE
name LIKE S%
ORDER BY id;
ANSWER
==========================
ID DEPT SALARY
MAX_SAL
--- ---- -------- -------10
20 18357.50 18357.50
190
20 14252.75 18357.50
200
42 11508.60 18352.80
220
51 17654.50
-
id
,dept
,salary
,(SELECT
MAX(salary)
FROM
staff s2
WHERE
s1.dept = s2.dept
AND
s2.dept < 50
GROUP BY dept)
AS max_sal
FROM
staff s1
WHERE
name LIKE S%
ORDER BY id;
ANSWER
==========================
ID DEPT SALARY
MAX_SAL
--- ---- -------- -------10
20 18357.50 18357.50
190
20 14252.75 18357.50
200
42 11508.60 18352.80
220
51 17654.50
-
Temporary Tables
279
Graeme Birchall
A common table expression is a named temporary table that is retained for the duration of a
SQL statement. There can be many temporary tables in a single SQL statement. Each must
have a unique name and be defined only once.
All references to a temporary table (in a given SQL statement run) return the same result.
This is unlike tables, views, or aliases, which are derived each time they are called. Also
unlike tables, views, or aliases, temporary tables never contain indexes.
WITH
,
identifier
AS (
( col. names )
select stmt
values stmt
Column names must be specified if the expression is recursive, or if the query invoked
returns duplicate column names.
The number of column names (if any) that are specified must match the number of columns returned.
If there is more than one common-table-expression, latter ones (only) can refer to the
output from prior ones. Cyclic references are not allowed.
A common table expression with the same name as a real table (or view) will replace the
real table for the purposes of the query. The temporary and real tables cannot be referred
to in the same query.
Temporary table names must follow standard DB2 table naming standards.
Select Examples
In this first query, we dont have to list the field names (at the top) because every field already
has a name (given in the SELECT):
WITH temp1 AS
(SELECT MAX(name) AS max_name
,MAX(dept) AS max_dept
FROM
staff
)
SELECT *
FROM
temp1;
ANSWER
==================
MAX_NAME MAX_DEPT
--------- -------Yamaguchi
84
ANSWER
==================
MAX_NAME MAX_DEPT
--------- -------Yamaguchi
84
280
A single query can have multiple common-table-expressions. In this next example we use two
expressions to get the department with the highest average salary:
WITH
temp1 AS
(SELECT
ANSWER
==========
MAX_AVG
---------20865.8625
dept
,AVG(salary) AS avg_sal
FROM
staff
GROUP BY dept),
temp2 AS
(SELECT
MAX(avg_sal) AS max_avg
FROM
temp1)
SELECT *
FROM
temp2;
ANSWER
==========
MAX_AVG
---------20865.8625
Figure 766, Same as prior example, but using nested table expressions
The next query first builds a temporary table, then derives a second temporary table from the
first, and then joins the two temporary tables together. The two tables refer to the same set of
rows, and so use the same predicates. But because the second table was derived from the first,
these predicates only had to be written once. This greatly simplified the code:
WITH temp1 AS
(SELECT
id
,name
,dept
,salary
FROM
staff
WHERE
id
< 300
AND
dept
<> 55
AND
name LIKE S%
AND
dept NOT IN
(SELECT deptnumb
FROM
org
WHERE division = SOUTHERN
OR location = HARTFORD)
)
,temp2 AS
(SELECT
dept
,MAX(salary) AS max_sal
FROM
temp1
GROUP BY dept
)
SELECT
t1.id
,t1.dept
,t1.salary
,t2.max_sal
FROM
temp1 t1
,temp2 t2
WHERE
t1.dept = t2.dept
ORDER BY t1.id;
ANSWER
==========================
ID DEPT SALARY
MAX_SAL
--- ---- -------- -------10
20 18357.50 18357.50
190
20 14252.75 18357.50
200
42 11508.60 11508.60
220
51 17654.50 17654.50
Temporary Tables
281
Graeme Birchall
Insert Usage
A common table expression can be used to an insert-select-from statement to build all or part
of the set of rows that are inserted:
INSERT INTO staff
WITH temp1 (max1) AS
(SELECT MAX(id) + 1
FROM
staff
)
SELECT max1,A,1,B,2,3,4
FROM
temp1;
Figure 769, Equivalent insert (to above) without common table expression
Full-Select
A full-select is an alternative way to define a temporary table. Instead of using a WITH clause
at the top of the statement, the temporary table definition is embedded in the body of the SQL
statement. Certain rules apply:
When used in a select statement, a full-select can either be generated in the FROM part of
the query - where it will return a temporary table, or in the SELECT part of the query where it will return a column of data.
When the result of a full-select is a temporary table (i.e. in FROM part of a query), the
table must be provided with a correlation name.
When the result of a full-select is a column of data (i.e. in SELECT part of query), each
reference to the temporary table must only return a single value.
The following query uses a nested table expression to get the average of an average - in this
case the average departmental salary (an average in itself) per division:
SELECT
division
,DEC(AVG(dept_avg),7,2) AS div_dept
,COUNT(*)
AS #dpts
,SUM(#emps)
AS #emps
FROM
(SELECT
division
,dept
,AVG(salary) AS dept_avg
,COUNT(*)
AS #emps
FROM
staff
ANSWER
,org
==============================
WHERE
dept = deptnumb
DIVISION DIV_DEPT #DPTS #EMPS
GROUP BY division
--------- -------- ----- ----,dept
Corporate 20865.86
1
4
)AS xxx
Eastern
15670.32
3
13
GROUP BY division;
Midwest
15905.21
2
9
Western
16875.99
2
9
282
SELECT id
FROM (SELECT *
FROM (SELECT id, years, salary
FROM (SELECT *
FROM
(SELECT *
FROM
staff
WHERE dept < 77
)AS t1
WHERE id < 300
)AS t2
WHERE job LIKE C%
)AS t3
WHERE salary < 18000
)AS t4
WHERE years < 5;
ANSWER
======
ID
--170
180
230
a.id
,a.dept
,a.salary
,DEC(b.avgsal,7,2) AS
FROM
staff a
LEFT OUTER JOIN
(SELECT
dept
,AVG(salary)
FROM
staff
GROUP BY dept
HAVING
AVG(salary)
)AS b
ON
a.dept = b.dept
WHERE
a.id
< 40
ORDER BY a.id;
avg_dept
AS dept
AS avgsal
ANSWER
=========================
ID DEPT SALARY AVG_DEPT
-- ---- -------- -------10
20 18357.50 16071.52
20
20 18171.25 16071.52
30
38 17506.75
-
> 16000
If the full-select query has a reference to a row in a table that is outside of the full-select, then
it needs to be written as a TABLE function call. In the next example, the preceding "A" table
is referenced in the full-select, and so the TABLE function call is required:
SELECT
a.id
,a.dept
,a.salary
,b.deptsal
FROM
staff a
,TABLE
(SELECT
b.dept
,SUM(b.salary) AS deptsal
FROM
staff b
WHERE
b.dept = a.dept
GROUP BY b.dept
)AS b
WHERE
a.id
< 40
ORDER BY a.id;
ANSWER
=========================
ID DEPT SALARY
DEPTSAL
-- ---- -------- -------10 20
18357.50 64286.10
20 20
18171.25 64286.10
30 38
17506.75 77285.55
Temporary Tables
283
Graeme Birchall
SELECT
a.id
,a.dept
,a.salary
,b.deptsal
FROM
staff a
,(SELECT
b.dept
,SUM(b.salary) AS deptsal
FROM
staff b
GROUP BY b.dept
)AS b
WHERE
a.id
< 40
AND
b.dept = a.dept
ORDER BY a.id;
ANSWER
=========================
ID DEPT SALARY
DEPTSAL
-- ---- -------- -------10 20
18357.50 64286.10
20 20
18171.25 64286.10
30 38
17506.75 77285.55
A full-select that returns a single column and row can be used in the SELECT part of a query:
SELECT
id
,salary
,(SELECT MAX(salary)
FROM
staff
) AS maxsal
FROM
staff a
WHERE
id < 60
ORDER BY id;
ANSWER
====================
ID SALARY
MAXSAL
-- -------- -------10 18357.50 22959.20
20 18171.25 22959.20
30 17506.75 22959.20
40 18006.00 22959.20
50 20659.80 22959.20
id
,salary
,(SELECT MAX(salary)
FROM
staff b
WHERE a.dept = b.dept
) AS maxsal
FROM
staff a
WHERE
id < 60
ORDER BY id;
ANSWER
====================
ID SALARY
MAXSAL
-- -------- -------10 18357.50 18357.50
20 18171.25 18357.50
30 17506.75 18006.00
40 18006.00 18006.00
50 20659.80 20659.80
ANSWER
==================================
ID DEPT SALARY 4
5
-- ---- -------- -------- -------10
20 18357.50 18357.50 22959.20
20
20 18171.25 18357.50 22959.20
30
38 17506.75 18006.00 22959.20
40
38 18006.00 18006.00 22959.20
50
15 20659.80 20659.80 22959.20
The following query uses both an uncorrelated and correlated full-select in the query that
builds the set of rows to be inserted:
284
The following example uses an uncorrelated full-select to assign a set of workers the average
salary in the company - plus two thousand dollars.
UPDATE staff a
SET
salary =
(SELECT AVG(salary)+ 2000
FROM
staff)
WHERE id < 60;
ANSWER:
=======
ID DEPT
-- ---10
20
20
20
30
38
40
38
50
15
SALARY
=================
BEFORE
AFTER
-------- -------18357.50 18675.64
18171.25 18675.64
17506.75 18675.64
18006.00 18675.64
20659.80 18675.64
Figure 779, Use uncorrelated Full-Select to give workers company AVG salary (+$2000)
The next statement uses a correlated full-select to assign a set of workers the average salary
for their department - plus two thousand dollars. Observe that when there is more than one
worker in the same department, that they all get the same new salary. This is because the fullselect is resolved before the first update was done, not after each.
UPDATE staff a
SET
salary =
(SELECT AVG(salary) + 2000
FROM
staff b
WHERE a.dept = b.dept )
WHERE id < 60;
ANSWER:
=======
ID DEPT
-- ---10
20
20
20
30
38
40
38
50
15
SALARY
=================
BEFORE
AFTER
-------- -------18357.50 18071.52
18171.25 18071.52
17506.75 17457.11
18006.00 17457.11
20659.80 17482.33
Figure 780, Use correlated Full-Select to give workers department AVG salary (+$2000)
NOTE: A full-select is always resolved just once. If it is queried using a correlated expression, then the data returned each time may differ, but the table remains unchanged.
Temporary Tables
285
Graeme Birchall
column-name
LIKE
table-name
view-name
AS
fullselect
INCLUDING
EXCLUDING
column-definition
)
COLUMN
EXCLUDING IDENTITY
INCLUDING IDENTITY
table-name
DEFINITION ONLY
DEFAULTS
COLUMN ATTRIBUTES
COLUMN ATTRIBUTES
NOT LOGGED
286
COUNT(*)
session.fred;
ANSWER
======
19
COUNT(*)
session.fred;
ANSWER
======
0
COMMIT;
SELECT
FROM
COUNT(*)
session.fred;
ANSWER
======
8
COUNT(*)
session.fred;
ANSWER
======
0
For a complete description of this feature, see the SQL reference. Below are some key points:
The temporary table name can be any valid DB2 table name. The qualifier, if provided,
must be SESSION. If the qualifier is not provided, it is assumed to be SESSION. If the
temporary table already exists, the WITH REPLACE clause must be used to override it.
An index can be defined on a global temporary table. The qualifier (i.e. SESSION) must
be explicitly provided.
Any column type can be used, except the following: BLOB, CLOB, DBCLOB, LONG
VARCHAR, LONG VARGRAPHIC, DATALINK, reference, and structured data types.
One can choose to preserve or delete (the default) the rows when a commit occurs.
Temporary Tables
287
Graeme Birchall
Before a user can create a declared global temporary table, a USER TEMPORARY tablespace that they have access to, has to be created. A typical definition follows:
CREATE USER TEMPORARY TABLESPACE FRED
MANAGED BY DATABASE
USING (FILE C:\DB2\TEMPFRED\FRED1 1000
,FILE C:\DB2\TEMPFRED\FRED2 1000
,FILE C:\DB2\TEMPFRED\FRED3 1000);
GRANT USE OF TABLESPACE FRED TO PUBLIC;
In general, do not use a Declared Global Temporary Table to hold job output data, especially
if the table is defined ON COMMIT PRESERVE ROWS. If the job fails halfway through, the
contents of the temporary table will be lost. If, prior to the failure, the job had updated and
then committed Production data, it may be impossible to recreate the lost output because the
committed rows cannot be updated twice.
288
Recursive SQL
Recursive SQL enables one to efficiently resolve all manner of complex logical structures
that can be really tough to work with using other techniques. On the down side, it is a little
tricky to understand at first and it is occasionally expensive. In this chapter we shall first
show how recursive SQL works and then illustrate some of the really cute things that one use
it for.
Use Recursion To
A good SQL statement is one that gets the correct answer, is easy to understand, and is efficient. Let us assume that a particular statement is correct. If the statement uses recursive SQL,
it is never going to be categorized as easy to understand (though the reading gets much easier
with experience). However, given the question being posed, it is possible that a recursive
SQL statement is the simplest way to get the required answer.
Recursive SQL statements are neither inherently efficient nor inefficient. Because they often
involve a join, it is very important that suitable indexes be provided. Given appropriate indexes, it is quite probable that a recursive SQL statement is the most efficient way to resolve
a particular business problem. It all depends upon the nature of the question: If every row
processed by the query is required in the answer set (e.g. Find all people who work for Bob),
then a recursive statement is likely to very efficient. If only a few of the rows processed by
the query are actually needed (e.g. Find all airline flights from Boston to Dallas, then show
only the five fastest) then the cost of resolving a large data hierarchy (or network), most of
which is immediately discarded, can be very prohibitive.
If one wants to get only a small subset of rows in a large data structure, it is very important
that of the unwanted data is excluded as soon as possible in the processing sequence. Some of
the queries illustrated in this chapter have some rather complicated code in them to do just
this. Also, always be on the lookout for infinitely looping data structures.
Conclusion
Recursive SQL statements can be very efficient, if coded correctly, and if there are suitable
indexes. When either of the above is not true, they can be very slow.
Recursive SQL
289
Graeme Birchall
field has related child keys, and the NUM field has the number of times the child occurs
within the related parent.
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
We want to use SQL to get a list of all the dependents of AAA. This list should include not
only those items like CCC that are directly related, but also values such as GGG, which are
indirectly related. The easiest way to answer this question (in SQL) is to use a recursive SQL
statement that goes thus:
WITH parent (pkey, ckey) AS
(SELECT pkey, ckey
FROM
hierarchy
WHERE pkey = AAA
UNION ALL
SELECT C.pkey, C.ckey
FROM
hierarchy C
,parent
P
WHERE P.ckey = C.pkey
)
SELECT pkey, ckey
FROM
parent;
ANSWER
=========
PKEY CKEY
---- ---AAA BBB
AAA CCC
AAA DDD
CCC EEE
DDD EEE
DDD FFF
FFF GGG
<
<
<
<
PROCESSING
SEQUENCE
==========
1st pass
""
""
2nd pass
3rd pass
""
4th pass
The WITH statement at the top defines a temporary table called PARENT.
The upper part of the UNION ALL is only invoked once. It does an initial population of
the PARENT table with the three rows that have an immediate parent key of AAA .
The lower part of the UNION ALL is run recursively until there are no more matches to
the join. In the join, the current child value in the temporary PARENT table is joined to
related parent values in the DATA table. Matching rows are placed at the front of the
temporary PARENT table. This recursive processing will stop when all of the rows in the
PARENT table have been joined to the DATA table.
The SELECT phrase at the bottom of the statement sends the contents of the PARENT
table back to the users program.
Another way to look at the above process is to think of the temporary PARENT table as a
stack of data. This stack is initially populated by the query in the top part of the UNION ALL.
Next, a cursor starts from the bottom of the stack and goes up. Each row obtained by the cursor is joined to the DATA table. Any matching rows obtained from the join are added to the
top of the stack (i.e. in front of the cursor). When the cursor reaches the top of the stack, the
statement is done. The following diagram illustrates this process:
290
KEY >
AAA
AAA
AAA
CCC
DDD
DDD
FFF
KEY >
BBB
CCC
DDD
EEE
EEE
FFF
GGG
Recursive SQL requires that there be a UNION ALL phrase between the two main parts
of the statement. The UNION ALL, unlike the UNION, allows for duplicate output rows,
which is what often comes out of recursive processing.
If done right, recursive SQL is often fairly efficient. When it involves a join similar to the
example shown above, it is important to make sure that this join is efficient. To this end,
suitable indexes should be provided.
The output of a recursive SQL is a temporary table (usually). Therefore, all temporary
table usage restrictions also apply to recursive SQL output. See the section titled "Common Table Expression" for details.
The output of one recursive expression can be used as input to another recursive expression in the same SQL statement. This can be very handy if one has multiple logical hierarchies to traverse (e.g. First find all of the states in the USA, then final all of the cities in
each state).
Any recursive coding, in any language, can get into an infinite loop - either because of
bad coding, or because the data being processed has a recursive value structure. To prevent your SQL running forever, see the section titled "Halting Recursive Processing" on
page 300.
Recursive SQL
291
Graeme Birchall
Introductory Recursion
This section will use recursive SQL statements to answer a series of simple business questions using the sample HIERARCHY table described on page 291. Be warned that things are
going to get decidedly more complex as we proceed.
List all Children #1
Find all the children of AAA. Dont worry about getting rid of duplicates, sorting the data, or
any other of the finer details.
WITH parent (ckey) AS
(SELECT ckey
FROM
hierarchy
WHERE pkey = AAA
UNION ALL
SELECT C.ckey
FROM
hierarchy C
,parent
P
WHERE P.ckey = C.pkey
)
SELECT ckey
FROM
parent;
ANSWER
======
CKEY
---BBB
CCC
DDD
EEE
EEE
FFF
GGG
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
The above SQL statement uses standard recursive processing. The first part of the UNION
ALL seeds the temporary table PARENT. The second part recursively joins the temporary
table to the source data table until there are no more matches. The final part of the query displays the result set.
Imagine that the HIERARCHY table used above is very large and that we also want the above
query to be as efficient as possible. In this case, two indexes are required; The first, on PKEY,
enables the initial select to run efficiently. The second, on CKEY, makes the join in the recursive part of the query efficient. The second index is arguably more important than the first
because the first is only used once, whereas the second index is used for each child of the toplevel parent.
List all Children #2
Find all the children of AAA, include in this list the value AAA itself. To satisfy the latter
requirement we will change the first SELECT statement (in the recursive code) to select the
parent itself instead of the list of immediate children. A DISTINCT is provided in order to
ensure that only one line containing the name of the parent (i.e. "AAA") is placed into the
temporary PARENT table.
NOTE: Before the introduction of recursive SQL processing, it often made sense to define
the top-most level in a hierarchical data structure as being a parent-child of itself. For example, the HIERARCHY table might contain a row indicating that "AAA" is a child of
"AAA". If the target table has data like this, add another predicate: C.PKEY <> C.CKEY to
the recursive part of the SQL statement to stop the query from looping forever.
292
Introductory Recursion
ANSWER
======
CKEY
---AAA
BBB
CCC
DDD
EEE
EEE
FFF
GGG
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
Get a distinct list of all the children of AAA. This query differs from the prior only in the use
of the DISTINCT phrase in the final select.
WITH parent (ckey) AS
(SELECT DISTINCT pkey
FROM
hierarchy
WHERE pkey = AAA
UNION ALL
SELECT C.ckey
FROM
hierarchy C
,parent
P
WHERE P.ckey = C.pkey
)
SELECT DISTINCT ckey
FROM
parent;
ANSWER
======
CKEY
---AAA
BBB
CCC
DDD
EEE
FFF
GGG
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
ANSWER
======
CKEY
---AAA
BBB
CCC
DDD
EEE
FFF
GGG
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
Get a list of all the children of AAA. For each value returned, show its level in the logical
hierarchy relative to AAA.
Recursive SQL
293
Graeme Birchall
ANSWER
========
CKEY LVL
---- --AAA
0
BBB
1
CCC
1
DDD
1
EEE
2
EEE
2
FFF
2
GGG
3
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
Get a list of all the children of AAA that are less than three levels below AAA.
WITH parent (ckey, lvl) AS
(SELECT DISTINCT pkey, 0
FROM
hierarchy
WHERE pkey = AAA
UNION ALL
SELECT C.ckey, P.lvl +1
FROM
hierarchy C
,parent
P
WHERE P.ckey = C.pkey
)
SELECT ckey, lvl
FROM
parent
WHERE lvl < 3;
ANSWER
========
CKEY LVL
---- --AAA
0
BBB
1
CCC
1
DDD
1
EEE
2
EEE
2
FFF
2
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
It may be inefficient because it resolves the whole hierarchy before discarding those levels that are not required.
To get around both of these problems, we can move the level check up into the body of the
recursive statement. This will stop the recursion from continuing as soon as we reach the target level. We will have to add "+ 1" to the check to make it logically equivalent:
WITH parent (ckey, lvl) AS
(SELECT DISTINCT pkey, 0
FROM
hierarchy
WHERE pkey = AAA
UNION ALL
SELECT C.ckey, P.lvl +1
FROM
hierarchy C
,parent
P
WHERE P.ckey = C.pkey
AND P.lvl+1 < 3
)
SELECT ckey, lvl
FROM
parent;
ANSWER
========
CKEY LVL
---- --AAA
0
BBB
1
CCC
1
DDD
1
EEE
2
EEE
2
FFF
2
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
294
Introductory Recursion
The only difference between this statement and the one before is that the level check is now
done in the recursive part of the statement. This new level-check predicate has a dual function: It gives us the answer that we want, and it stops the SQL from running forever if the
database happens to contain an infinite loop (e.g. DDD was also a parent of AAA).
One problem with this general statement design is that it can not be used to list only that data
which pertains to a certain lower level (e.g. display only level 3 data). To answer this kind of
question efficiently we can combine the above two queries, having appropriate predicates in
both places (see next).
Select Explicit Level
Get a list of all the children of AAA that are exactly two levels below AAA.
WITH parent (ckey, lvl) AS
(SELECT DISTINCT pkey, 0
FROM
hierarchy
WHERE pkey = AAA
UNION ALL
SELECT C.ckey, P.lvl +1
FROM
hierarchy C
,parent
P
WHERE P.ckey = C.pkey
AND P.lvl+1 < 3
)
SELECT ckey, lvl
FROM
parent
WHERE lvl = 2;
ANSWER
========
CKEY LVL
---- --EEE
2
EEE
2
FFF
2
HIERARCHY
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
Multiple recursive joins can be included in a single query. The joins can run independently, or
the output from one recursive join can be used as input to a subsequent. Such code enables
one to do the following:
Expand multiple hierarchies in a single query. For example, one might first get a list of
all departments (direct and indirect) in a particular organization, and then use the department list as a seed to find all employees (direct and indirect) in each department.
Go down, and then up, a given hierarchy in a single query. For example, one might want
to find all of the children of AAA, and then all of the parents. The combined result is the
list of objects that AAA is related to via a direct parent-child path.
Go down the same hierarchy twice, and then combine the results to find the matches, or
the non-matches. This type of query might be used to, for example, see if two companies
own shares in the same subsidiary.
The next example recursively searches the HIERARCHY table for all values that are either a
child or a parent (direct or indirect) of the object DDD. The first part of the query gets the list
of children, the second part gets the list of parents (but never the value DDD itself), and then
the results are combined.
Recursive SQL
295
Graeme Birchall
ANSWER
========
KKEY LVL
---- --AAA
-1
EEE
1
FFF
1
GGG
2
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
Some recursive SQL statements generate the following warning when the DB2 parser has
reason to suspect that the statement may run forever:
SQL0347W The recursive common table expression "GRAEME.TEMP1" may contain an
infinite loop. SQLSTATE=01605
The text that accompanies this message provides detailed instructions on how to code recursive SQL so as to avoid getting into an infinite loop. The trouble is that even if you do exactly
as told you may still get the silly message. To illustrate, the following two SQL statements
are almost identical. Yet the first gets a warning and the second does not:
WITH temp1 (n1) AS
(SELECT id
FROM
staff
WHERE id = 10
UNION ALL
SELECT n1 +10
FROM
temp1
WHERE n1 < 50
)
SELECT *
FROM
temp1;
ANSWER
======
N1
-warn
10
20
30
40
50
ANSWER
======
N1
-10
20
30
40
50
296
Introductory Recursion
CONVERGENT
==========
RECURSIVE
=========
BALANCED
========
AAA
|
+-+-+
|
|
BBB CCC
|
+-+-+
|
|
DDD EEE
AAA
|
+-+-+
|
|
BBB CCC
|
|
+-+-+-+
|
|
DDD EEE
AAA<--+
|
|
+-+-+ |
|
| |
BBB CCC>+
|
+-+-+
|
|
DDD EEE
AAA
|
+-+-+
|
|
BBB CCC
|
|
|
+---+
|
|
|
DDD EEE FFF
UNBALANCED
==========
AAA
|
+-+-+
|
|
BBB CCC
|
+-+-+
|
|
DDD EEE
In this flavour of hierarchy, no object has more than one parent. Each object can have none,
one, or more than one, dependent child objects. Physical objects (e.g. Geographic entities)
tend to be represented in this type of hierarchy.
This type of hierarchy will often incorporate the concept of different layers in the hierarchy
referring to differing kinds of object - each with its own set of attributes. For example, a Geographic hierarchy might consist of countries, states, cities, and street addresses.
A single table can be used to represent this kind of hierarchy in a fully normalized form. One
field in the table will be the unique key, another will point to the related parent. Other fields
in the table may pertain either to the object in question, or to the relationship between the object and its parent. For example, in the following table the PRICE field has the price of the
object, and the NUM field has the number of times that the object occurs in the parent.
OBJECTS_RELATES
+---------------------+
|KEYO |PKEY |NUM|PRICE|
|-----|-----|---|-----|
|AAA |
|
| $10|
|BBB |AAA | 1| $21|
|CCC |AAA | 5| $23|
|DDD |AAA | 20| $25|
|EEE |DDD | 44| $33|
|FFF |DDD | 5| $34|
|GGG |FFF | 5| $44|
+---------------------+
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
+--+--+
|
|
EEE
FFF
|
|
GGG
Recursive SQL
297
Graeme Birchall
Some database designers like to make the arbitrary judgment that every object has a parent,
and in those cases where there is no "real" parent, the object considered to be a parent of itself. In the above table, this would mean that AAA would be defined as a parent of AAA.
Please appreciate that this judgment call does not affect the objects that the database represents, but it can have a dramatic impact on SQL usage and performance.
Prior to the introduction of recursive SQL, defining top level objects as being self-parenting
was sometimes a good idea because it enabled one to resolve a hierarchy using a simple join
without unions. This same process is now best done with recursive SQL. Furthermore, if objects in the database are defined as self-parenting, the recursive SQL will get into an infinite
loop unless extra predicates are provided.
Convergent Hierarchy
NUMBER OF TABLES: A convergent hierarchy has many-to-many relationships that require two tables for normalized data storage. The other hierarchy types require but a single table.
In this flavour of hierarchy, each object can have none, one, or more than one, parent and/or
dependent child objects. Convergent hierarchies are often much more difficult to work with
than similar divergent hierarchies. Logical entities, or man-made objects, (e.g. Company Divisions) often have this type of hierarchy.
Two tables are required in order to represent this kind of hierarchy in a fully normalized form.
One table describes the object, and the other describes the relationships between the objects.
OBJECTS
+-----------+
|KEYO |PRICE|
|-----|-----|
|AAA | $10|
|BBB | $21|
|CCC | $23|
|DDD | $25|
|EEE | $33|
|FFF | $34|
|GGG | $44|
+-----------+
RELATIONSHIPS
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |EEE | 44|
|DDD |FFF | 5|
|FFF |GGG | 5|
+---------------+
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
In this flavour of hierarchy, each object can have none, one, or more than one parent. Also,
each object can be a parent and/or a child of itself via another object, or via itself directly. In
the business world, this type of hierarchy is almost always wrong. When it does exist, it is
often because a standard convergent hierarchy has gone a bit haywire.
This database design is exactly the same as the one for a convergent hierarchy. Two tables are
(usually) required in order to represent the hierarchy in a fully normalized form. One table
describes the object, and the other describes the relationships between the objects.
298
OBJECTS
+-----------+
|KEYO |PRICE|
|-----|-----|
|AAA | $10|
|BBB | $21|
|CCC | $23|
|DDD | $25|
|EEE | $33|
|FFF | $34|
|GGG | $44|
+-----------+
RELATIONSHIPS
+---------------+
|PKEY |CKEY |NUM|
|-----|-----|---|
|AAA |BBB | 1|
|AAA |CCC | 5|
|AAA |DDD | 20|
|CCC |EEE | 33|
|DDD |AAA | 99|
|DDD |FFF | 5|
|DDD |EEE | 44|
|FFF |GGG | 5|
+---------------+
AAA <------+
|
|
+-----+-----+
|
|
|
|
|
BBB
CCC
DDD>-+
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
In some logical hierarchies the distance, in terms of the number of intervening levels, from
the top parent entity to its lowest-level child entities is the same for all legs of the hierarchy.
Such a hierarchy is considered to be balanced. An unbalanced hierarchy is one where the distance from a top-level parent to a lowest-level child is potentially different for each leg of the
hierarchy.
AAA
|
+-----+-----+
|
|
|
BBB
CCC
DDD
|
|
|
|
|
+-+-+
|
|
|
|
EEE
FFF GGG HHH
AAA
|
+---+----+
|
|
|
| CCC DDD
|
|
|
| +-+ +-+-+
| |
|
|
FFF
GGG HHH
|
|
III
The difference between a data and a pointer hierarchy is not one of design, but of usage. In a
pointer schema, the main application tables do not store a description of the logical hierarchy.
Instead, they only store the base data. Separate to the main tables are one, or more, related
tables that define which hierarchies each base data row belongs to.
Recursive SQL
299
Graeme Birchall
Typically, in a pointer hierarchy, the main data tables are much larger and more active than
the hierarchical tables. A banking application is a classic example of this usage pattern. There
is often one table that contains core customer information and several related tables that enable one to do analysis by customer category.
A data hierarchy is an altogether different beast. An example would be a set of tables that
contain information on all that parts that make up an aircraft. In this kind of application the
most important information in the database is often that which pertains to the relationships
between objects. These tend to be very complicated often incorporating the attributes: quantity, direction, and version.
Recursive processing of a data hierarchy will often require that one does a lot more than just
find all dependent keys. For example, to find the gross weight of an aircraft from such a database one will have to work with both the quantity and weight of all dependent objects. Those
objects that span sub-assembles (e.g. a bolt connecting to engine to the wing) must not be
counted twice, missed out, nor assigned to the wrong sub-grouping. As always, such questions are essentially easy to answer, the trick is to get the right answer.
Keep a record of where you have been, and if you ever come back, either fail or in some
other way stop recursive processing.
Keep a record of where you have been, and if you ever come back, simply ignore that
row and keep on resolving the rest of hierarchy.
The following table is a normalized representation of the recursive hierarchy on the right.
Note that AAA and DDD are both a parent and a child of each other.
TROUBLE
+---------+
|PKEY|CKEY|
|----|----|
|AAA |BBB |
|AAA |CCC |
|AAA |DDD |
|CCC |EEE |
|DDD |AAA |
|DDD |FFF |
|DDD |EEE |
|FFF |GGG |
+---------+
AAA <------+
|
|
+-----+-----+
|
|
|
|
|
BBB
CCC
DDD>-+
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
300
NOT NULL
NOT NULL);
In the above table, the beginning object (i.e. AAA) is part of the data loop. This type of loop
can be detected using simpler SQL than what is given here. But a loop that does not include
the beginning object (e.g. AAA points to BBB, which points to CCC, which points back to
BBB) requires the somewhat complicated SQL that is used in this section.
Stop After "n" Levels
Find all the children of AAA. In order to avoid running forever, stop after four levels.
WITH parent (pkey, ckey, lvl) AS
(SELECT DISTINCT
pkey
,pkey
,0
FROM
trouble
WHERE pkey = AAA
UNION ALL
SELECT C.pkey
,C.ckey
,P.lvl + 1
FROM
trouble C
,parent
P
WHERE P.ckey
= C.pkey
AND P.lvl + 1 < 4
)
SELECT *
FROM
parent;
ANSWER
=============
PKEY CKEY LVL
---- ---- --AAA AAA
0
AAA BBB
1
AAA CCC
1
AAA DDD
1
CCC EEE
2
DDD AAA
2
DDD EEE
2
DDD FFF
2
AAA BBB
3
AAA CCC
3
AAA DDD
3
FFF GGG
3
TROUBLE
+---------+
|PKEY|CKEY|
|----|----|
|AAA |BBB |
|AAA |CCC |
|AAA |DDD |
|CCC |EEE |
|DDD |AAA |
|DDD |FFF |
|DDD |EEE |
|FFF |GGG |
+---------+
Recursive SQL
301
Graeme Birchall
A far better way to stop recursive processing is to halt when, and only when, we determine
that we have been to the target row previously. To do this, we need to maintain a record of
where we have been, and then check this record against the current key value in each row
joined to. DB2 does not come with an in-built function that can do this checking, so we shall
define our own.
Define Function
Below is the definition code for a user-defined DB2 function that is very similar to the standard LOCATE function. It searches for one string in another, block by block. For example, if
one was looking for the string "ABC", this function would search the first three bytes, then
the next three bytes, and so on. If a match is found, the function returns the relevant block
number, else zero.
CREATE FUNCTION LOCATE_BLOCK(searchstr VARCHAR(30000)
,lookinstr VARCHAR(30000))
RETURNS INTEGER
BEGIN ATOMIC
DECLARE lookinlen, searchlen INT;
DECLARE locatevar, returnvar INT DEFAULT 0;
DECLARE beginlook
INT DEFAULT 1;
SET lookinlen = LENGTH(lookinstr);
SET searchlen = LENGTH(searchstr);
WHILE locatevar = 0
AND
beginlook <= lookinlen DO
SET locatevar = LOCATE(searchstr,SUBSTR(lookinstr
,beginlook
,searchlen));
SET beginlook = beginlook + searchlen;
SET returnvar = returnvar + 1;
END WHILE;
IF locatevar = 0 THEN
SET returnvar = 0;
END IF;
RETURN returnvar;
END
ANSWER
=================
ID NAME
L1 L2
--- ------- -- -70 Rothman 3 2
220 Smith
4 0
Now all we need to do is build a string, as we do the recursion, that holds every key value that
has previously been accessed. This can be done using simple concatenation:
302
Recursive SQL
303
Graeme Birchall
AS
ANSWER
=========
PKEY CKEY
---- ---DDD AAA
TROUBLE
+---------+
|PKEY|CKEY|
|----|----|
|AAA |BBB |
|AAA |CCC |
|AAA |DDD |
|CCC |EEE |
|DDD |AAA |
|DDD |FFF |
|DDD |EEE |
|FFF |GGG |
+---------+
304
AS
DELETE
FROM
trouble
WHERE (pkey,ckey) IN
(SELECT pkey, ckey
FROM
SESSION.del_list);
TROUBLE
+---------+
|PKEY|CKEY|
|----|----|
|AAA |BBB |
|AAA |CCC |
|AAA |DDD |
|CCC |EEE |
|DDD |AAA |
|DDD |FFF |
|DDD |EEE |
|FFF |GGG |
+---------+
AAA <------+
|
|
+-----+-----+
|
|
|
|
|
BBB
CCC
DDD>-+
|
|
+-+ +-+--+
| |
|
EEE
FFF
|
|
GGG
The LOCATE_BLOCK solution shown above works fine, as long as the key in question is a
fixed length character field. If it isnt, it can be converted to one, depending on what it is:
Rather that go searching for loops, one can toss in a couple of triggers that will prevent the
table from every getting data loops in the first place. There will be one trigger for inserts, and
another for updates. Both will have the same general logic:
Recursively scan the existing rows, starting with the new CKEY value.
Compare each existing CKEY value retrieved to the new PKEY value. If it matches, the
changed row will cause a loop, so flag an error.
Recursive SQL
305
Graeme Birchall
TROUBLE
+---------+
|PKEY|CKEY|
|----|----|
|AAA |BBB |
|AAA |CCC |
|AAA |DDD |
|CCC |EEE |
|DDD |AAA |
|DDD |FFF |
|DDD |EEE |
|FFF |GGG |
+---------+
306
One of the more difficult problems in any relational database system involves joining across
multiple hierarchical data structures. The task is doubly difficult when one or more of the hierarchies involved is a data structure that has to be resolved using recursive processing. In this
section, we will describe how one can use a mixture of tables and triggers to answer this kind
of query very efficiently.
A typical question might go as follows: Find all matching rows where the customer is in some
geographic region, and the item sold is in some product category, and person who made the
sale is in some company sub-structure. If each of these qualifications involves expanding a
hierarchy of object relationships of indeterminate and/or nontrivial depth, then a simple join
or standard data denormalization will not work.
In DB2, one can answer this kind of question by using recursion to expand each of the data
hierarchies. Then the query would join (sans indexes) the various temporary tables created by
the recursive code to whatever other data tables needed to be accessed. Unfortunately, the
performance will probably be lousy.
Alternatively, one can often efficiently answer this general question using a set of suitably
indexed summary tables that are an expanded representation of each data hierarchy. With
these tables, the DB2 optimizer can much more efficiently join to other data tables, and so
deliver suitable performance.
In this section, we will show how to make these summary tables and, because it is a prerequisite, also show how to ensure that the related base tables do not have recursive data structures.
Two solutions will be described: One that is simple and efficient, but which stops updates to
key values. And another that imposes fewer constraints, but which is a bit more complicated.
Limited Update Solution
Below on the left is a hierarchy of data items. This is a typical unbalanced, non-recursive data
hierarchy. In the center is a normalized representation of this hierarchy. The only thing that is
perhaps a little unusual here is that an item at the top of a hierarchy (e.g. AAA) is deemed to
be a parent of itself. On the right is an exploded representation of the same hierarchy.
AAA
|
BBB
|
+-----+
|
|
CCC
EEE
|
DDD
HIERARCHY#1
+--------------------+
|KEYY|PKEY|DATA
|
|----|----|----------|
|AAA |AAA |SOME DATA |
|BBB |AAA |MORE DATA |
|CCC |BBB |MORE JUNK |
|DDD |CCC |MORE JUNK |
|EEE |BBB |JUNK DATA |
+--------------------+
EXPLODED#1
+-------------+
|PKEY|CKEY|LVL|
|----|----|---|
|AAA |AAA | 0|
|AAA |BBB | 1|
|AAA |CCC | 2|
|AAA |DDD | 3|
|AAA |EEE | 2|
|BBB |BBB | 0|
|BBB |CCC | 1|
|BBB |DDD | 2|
|BBB |EEE | 1|
|CCC |CCC | 0|
|CCC |DDD | 1|
|DDD |DDD | 0|
|EEE |EEE | 0|
+-------------+
Recursive SQL
307
Graeme Birchall
Below is the CREATE code for the above normalized table and a dependent trigger:
CREATE TABLE hierarchy#1
(keyy
CHAR(3) NOT NULL
,pkey
CHAR(3) NOT NULL
,data
VARCHAR(10)
,CONSTRAINT hierarchy11 PRIMARY KEY(keyy)
,CONSTRAINT hierarchy12 FOREIGN KEY(pkey)
REFERENCES hierarchy#1 (keyy) ON DELETE CASCADE);
CREATE TRIGGER HIR#1_UPD
NO CASCADE BEFORE UPDATE OF pkey ON hierarchy#1
REFERENCING NEW AS NNN
OLD AS OOO
FOR EACH ROW MODE DB2SQL
WHEN (NNN.pkey <> OOO.pkey)
SIGNAL SQLSTATE 70001 (CAN NOT UPDATE pkey);
Figure 822, Hierarchy table that does not allow updates to PKEY
Note the following:
The KEYY column is the primary key, which ensures that each value must be unique,
and that this field can not be updated.
The PKEY column is a foreign key of the KEYY column. This means that this field must
always refer to a valid KEYY value. This value can either be in another row (if the new
row is being inserted at the bottom of an existing hierarchy), or in the new row itself (if a
new independent data hierarchy is being established).
The ON DELETE CASCADE referential integrity rule ensures that when a row is deleted, all dependent rows are also deleted.
The TRIGGER prevents any updates to the PKEY column. This is a BEFORE trigger,
which means that it stops the update before it is applied to the database.
All of the above rules and restrictions act to prevent either an insert or an update for ever acting on any row that is not at the bottom of a hierarchy. Consequently, it is not possible for a
hierarchy to ever exist that contains a loop of multiple data items.
Creating an Exploded Equivalent
Once we have ensured that the above table can never have recursive data structures, we can
define a dependent table that holds an exploded version of the same hierarchy. Triggers will
be used to keep the two tables in sync. Here is the CREATE code for the table:
CREATE TABLE exploded#1
(pkey CHAR(4)
NOT NULL
,ckey CHAR(4)
NOT NULL
,lvl SMALLINT
NOT NULL
,PRIMARY KEY(pkey,ckey));
Figure 824, Trigger to maintain exploded table after delete in hierarchy table
308
The next trigger is run every time a row is inserted into the hierarchy table. It uses recursive
code to scan the hierarchy table upwards, looking for all parents of the new row. The resultset is then inserted into the exploded table:
CREATE TRIGGER EXP#1_INS
AFTER INSERT ON hierarchy#1
REFERENCING NEW AS NNN
FOR EACH ROW MODE DB2SQL
INSERT
INTO exploded#1
WITH temp(pkey, ckey, lvl) AS
(VALUES (NNN.keyy
,NNN.keyy
,0)
UNION ALL
SELECT N.pkey
,NNN.keyy
,T.lvl +1
FROM
temp
T
,hierarchy#1 N
WHERE
N.keyy = T.pkey
AND
N.keyy <> N.pkey
)
SELECT *
FROM
temp;
HIERARCHY#1
+--------------+
|KEYY|PKEY|DATA|
|----|----|----|
|AAA |AAA |S...|
|BBB |AAA |M...|
|CCC |BBB |M...|
|DDD |CCC |M...|
|EEE |BBB |J...|
+--------------+
EXPLODED#1
+-------------+
|PKEY|CKEY|LVL|
|----|----|---|
|AAA |AAA | 0|
|AAA |BBB | 1|
|AAA |CCC | 2|
|AAA |DDD | 3|
|AAA |EEE | 2|
|BBB |BBB | 0|
|BBB |CCC | 1|
|BBB |DDD | 2|
|BBB |EEE | 1|
|CCC |CCC | 0|
|CCC |DDD | 1|
|DDD |DDD | 0|
|EEE |EEE | 0|
+-------------+
Figure 825, Trigger to maintain exploded table after insert in hierarchy table
There is no update trigger because updates are not allowed to the hierarchy table.
Querying the Exploded Table
Once supplied with suitable indexes, the exploded table can be queried like any other table. It
will always return the current state of the data in the related hierarchy table.
SELECT
FROM
WHERE
ORDER BY
*
exploded#1
pkey = :host-var
pkey
,ckey
,lvl;
Not all applications want to limit updates to the data hierarchy as was done above. In particular, they may want the user to be able to move an object, and all its dependents, from one
valid point (in a data hierarchy) to another. This means that we cannot prevent valid updates
to the PKEY value.
Below is the CREATE statement for a second hierarchy table. The only difference between
this table and the previous one is that there is now an ON UPDATE RESTRICT clause. This
prevents updates to PKEY that do not point to a valid KEYY value either in another row, or
in the row being updated:
CREATE TABLE hierarchy#2
(keyy
CHAR(3) NOT NULL
,pkey
CHAR(3) NOT NULL
,data
VARCHAR(10)
,CONSTRAINT NO_loopS21 PRIMARY KEY(keyy)
,CONSTRAINT NO_loopS22 FOREIGN KEY(pkey)
REFERENCES hierarchy#2 (keyy) ON DELETE CASCADE
ON UPDATE RESTRICT);
Recursive SQL
309
Graeme Birchall
The previous hierarchy table came with a trigger that prevented all updates to the PKEY field.
This table comes instead with a trigger than checks to see that such updates do not result in a
recursive data structure. It starts out at the changed row, then works upwards through the
chain of PKEY values. If it ever comes back to the original row, it flags an error:
CREATE TRIGGER HIR#2_UPD
HIERARCHY#2
NO CASCADE BEFORE UPDATE OF pkey ON hierarchy#2
+--------------+
REFERENCING NEW AS NNN
|KEYY|PKEY|DATA|
OLD AS OOO
|----|----|----|
FOR EACH ROW MODE DB2SQL
|AAA |AAA |S...|
WHEN (NNN.pkey <> OOO.pkey
|BBB |AAA |M...|
AND NNN.pkey <> NNN.keyy)
|CCC |BBB |M...|
WITH temp (keyy, pkey) AS
|DDD |CCC |M...|
(VALUES (NNN.keyy
|EEE |BBB |J...|
,NNN.pkey)
+--------------+
UNION ALL
SELECT LP2.keyy
,CASE
WHEN LP2.keyy = NNN.keyy
THEN RAISE_ERROR(70001,LOOP FOUND)
ELSE LP2.pkey
END
FROM
hierarchy#2 LP2
,temp
TMP
WHERE TMP.pkey = LP2.keyy
AND TMP.keyy <> TMP.pkey
)
SELECT *
FROM
temp;
Figure 828, Trigger to check for recursive data structures before update of PKEY
NOTE: The above is a BEFORE trigger, which means that it gets run before the change is
applied to the database. By contrast, the triggers that maintain the exploded table are all
AFTER triggers. In general, one uses before triggers check for data validity, while after
triggers are used to propagate changes.
Creating an Exploded Equivalent
The following exploded table is exactly the same as the previous. It will be maintained in
sync with changes to the related hierarchy table:
CREATE TABLE exploded#2
(pkey CHAR(4)
NOT NULL
,ckey CHAR(4)
NOT NULL
,lvl
SMALLINT NOT NULL
,PRIMARY KEY(pkey,ckey));
Figure 830, Trigger to maintain exploded table after delete in hierarchy table
310
The next trigger is run every time a row is inserted into the hierarchy table. It uses recursive
code to scan the hierarchy table upwards, looking for all parents of the new row. The resultset is then inserted into the exploded table:
CREATE TRIGGER EXP#2_INS
AFTER INSERT ON hierarchy#2
REFERENCING NEW AS NNN
FOR EACH ROW MODE DB2SQL
INSERT
INTO
exploded#2
WITH temp(pkey, ckey, lvl) AS
(SELECT NNN.keyy
,NNN.keyy
,0
FROM
hierarchy#2
WHERE
keyy = NNN.keyy
UNION ALL
SELECT N.pkey
,NNN.keyy
,T.lvl +1
FROM
temp
T
,hierarchy#2 N
WHERE
N.keyy = T.pkey
AND
N.keyy <> N.pkey
)
SELECT *
FROM
temp;
HIERARCHY#2
+--------------+
|KEYY|PKEY|DATA|
|----|----|----|
|AAA |AAA |S...|
|BBB |AAA |M...|
|CCC |BBB |M...|
|DDD |CCC |M...|
|EEE |BBB |J...|
+--------------+
EXPLODED#2
+-------------+
|PKEY|CKEY|LVL|
|----|----|---|
|AAA |AAA | 0|
|AAA |BBB | 1|
|AAA |CCC | 2|
|AAA |DDD | 3|
|AAA |EEE | 2|
|BBB |BBB | 0|
|BBB |CCC | 1|
|BBB |DDD | 2|
|BBB |EEE | 1|
|CCC |CCC | 0|
|CCC |DDD | 1|
|DDD |DDD | 0|
|EEE |EEE | 0|
+-------------+
Figure 831, Trigger to maintain exploded table after insert in hierarchy table
The next trigger is run every time a PKEY value is updated in the hierarchy table. It deletes
and then reinserts all rows pertaining to the updated object, and all its dependents. The code
goes as follows:
Delete all rows that point to children of the row being updated. The row being updated is also
considered to be a child.
In the following insert, first use recursion to get a list of all of the children of the row that has
been updated. Then work out the relationships between all of these children and all of their
parents. Insert this second result-set back into the exploded table.
CREATE TRIGGER EXP#2_UPD
AFTER UPDATE OF pkey ON hierarchy#2
REFERENCING OLD AS OOO
NEW AS NNN
FOR EACH ROW MODE DB2SQL
BEGIN ATOMIC
DELETE
FROM
exploded#2
WHERE ckey IN
(SELECT ckey
FROM
exploded#2
WHERE pkey = OOO.keyy);
INSERT
INTO
exploded#2
WITH temp1(ckey) AS
(VALUES (NNN.keyy)
UNION ALL
SELECT N.keyy
FROM
temp1
T
,hierarchy#2 N
WHERE
N.pkey = T.ckey
AND
N.pkey <> N.keyy
)
Figure 832, Trigger to run after update of PKEY in hierarchy table (part 1 of 2)
Recursive SQL
311
Graeme Birchall
Figure 833, Trigger to run after update of PKEY in hierarchy table (part 2 of 2)
NOTE: The above trigger lacks a statement terminator because it contains atomic SQL,
which means that the semi-colon can not be used. Choose anything you like.
Querying the Exploded Table
Once supplied with suitable indexes, the exploded table can be queried like any other table. It
will always return the current state of the data in the related hierarchy table.
SELECT
FROM
WHERE
ORDER BY
*
exploded#2
pkey = :host-var
pkey
,ckey
,lvl;
312
Retaining a Record
This chapter will describe a rather complex table/view/trigger schema that will enable us to
offer several features that are often asked for:
Show the state of the data, as it was, at any point in the past (historical analysis).
Follow the sequence of changes to any item (e.g. customer) in the database.
Do "what if" analysis by creating virtual copies of the real world, and then changing them
as desired, without affecting the real-world data.
Some sample code to illustrate the above concepts will be described below. A more complete
example is available from my website.
Schema Design
Recording Changes
NOT NULL
NULL
NULL
NULL
NULL
The history table has the same fields as the original Customer table, plus the following:
Retaining a Record
313
Graeme Birchall
CUR-USER: The user who made the change (for auditing purposes).
PRV-CUST#: The previous customer number. This field enables one follow the sequence
of changes for a given customer. The value is null if the action is an insert.
PRV-TS: The timestamp of the last time the row was changed (null for inserts).
Observe that this history table does not have an end-timestamp. Rather, each row points back
to the one that it (optionally) replaces. One advantage of such a schema is that there can be a
many-to-one relationship between any given row, and the row, or rows, that replace it. When
we add versions into the mix, this will become important.
Triggers
Below is the relevant insert trigger. It replicates the new customer row in the history table,
along with the new fields. Observe that the two "previous" fields are set to null:
CREATE TRIGGER customer_ins
AFTER
INSERT ON customer
REFERENCING NEW AS nnn
FOR EACH ROW
MODE DB2SQL
INSERT INTO customer_his VALUES
(nnn.cust#
,nnn.cust_name
,nnn.cust_mgr
,CURRENT TIMESTAMP
,I
,USER
,NULL
,NULL);
314
Schema Design
Below is the delete trigger. It is similar to the update trigger, except that the action is different
and we are under no obligation to copy over the old non-key-data columns - but we can if we
wish:
CREATE TRIGGER customer_del
AFTER
DELETE ON customer
REFERENCING OLD AS ooo
FOR EACH ROW
MODE DB2SQL
INSERT INTO customer_his VALUES
(ooo.cust#
,NULL
,NULL
,CURRENT TIMESTAMP
,D
,USER
,ooo.cust#
,(SELECT MAX(cur_ts)
FROM
customer_his hhh
WHERE ooo.cust# = hhh.cust#));
We are now going to define a view that will let the user query the customer-history table - as
if it were the ordinary customer table, but to look at the data as it was at any point in the past.
To enable us to hide all the nasty SQL that is required to do this, we are going to ask that the
user first enter a row into a profile table that has two columns:
The point in time at which the user wants to see the customer data.
NOT NULL
NOT NULL DEFAULT 9999-12-31-24.00.00
There does not exist any row that "replaces" the current row (and that row has a current
timestamp that is <= to the profile timestamp).
Retaining a Record
315
Graeme Birchall
Data items that are updated very frequently (e.g. customer daily balance) may perform
poorly when queried because many rows will have to be processed in order to find the
one that has not been replaced.
The view uses the USER special register, which may not be unique per actual user.
The next design is similar to the previous, but we are also going to allow users to both see and
change the world - as it was in the past, and as it is now, without affecting the real-world data.
These extra features require a much more complex design:
We cannot use a base table and a related history table, as we did above. Instead we have
just the latter, and use both views and INSTEAD OF triggers to make the users think that
they are really seeing and/or changing the former.
We need a version table - to record when the data in each version (i.e. virtual copy of the
real world) separates from the real world data.
Data integrity features, like referential integrity rules, have to be hand-coded in triggers,
rather that written using standard DB2 code.
Version Table
316
Schema Design
Each version must have a begin-timestamp, which records at what point in time it separates from the real world. This value must be <= the current time.
Rows cannot be updated or deleted in this table - only inserted. This rule is necessary to
ensure that we can always trace all changes - in every version.
The real-world is deemed to have a version number of zero, and a begin-timestamp value
of high-values.
Profile Table
The following profile table has one row per user (i.e. USER special register) that reads from
or changes the data tables. It records what version the user is currently using (note: the version timestamp data is maintained using triggers):
CREATE TABLE profile
(user_id
VARCHAR(10)
NOT NULL
,vrsn
INTEGER
NOT NULL
,vrsn_bgn_ts TIMESTAMP
NOT NULL
,CONSTRAINT profile1 FOREIGN KEY(vrsn)
REFERENCES version(vrsn)
ON DELETE RESTRICT
,PRIMARY KEY(user_id));
The first three fields are the only ones that the user will see.
The users will never update this table directly. They will make changes to a view of the
table, which will then invoke INSTEAD OF triggers.
The foreign key check (on version) can be removed - if it is forbidden to ever delete any
version. This check stops the removal of versions that have changed data.
Retaining a Record
317
Graeme Birchall
The constraint on CUR_ACTN is just a double-check - to make sure that the triggers that
will maintain this table do not have an error. It can be removed, if desired.
The secondary index will make the following view more efficient.
CUR-USER: The user who made the change (for auditing purposes).
PRV-CUST#: The previous customer number. This field enables one follow the sequence
of changes for a given customer. The value is null if the action is an insert.
PRV-TS: The timestamp of the last time the row was changed (null for inserts).
PRV-VRNS: The version of the row being replaced (null for inserts).
Views
The following view displays the current state of the data in the above customer table - based
on the version that the user is currently using:
CREATE VIEW customer_vw AS
SELECT *
FROM
customer_his hhh
,profile
ppp
WHERE
ppp.user_id
= USER
AND
hhh.cur_actn <> D
AND ((ppp.vrsn
= 0
AND
hhh.cur_vrsn
= 0)
OR (ppp.vrsn
> 0
AND
hhh.cur_vrsn
= 0
AND
hhh.cur_ts
< ppp.vrsn_bgn_ts)
OR (ppp.vrsn
> 0
AND
hhh.cur_vrsn
= ppp.vrsn))
AND
NOT EXISTS
(SELECT *
FROM
customer_his nnn
WHERE
nnn.prv_cust# = hhh.cust#
AND
nnn.prv_ts
= hhh.cur_ts
AND
nnn.prv_vrsn
= hhh.cur_vrsn
AND ((ppp.vrsn
= 0
AND
nnn.cur_vrsn
= 0)
OR (ppp.vrsn
> 0
AND
nnn.cur_vrsn
= 0
AND
nnn.cur_ts
< ppp.vrsn_bgn_ts)
OR (ppp.vrsn
> 0
AND
nnn.cur_vrsn
= ppp.vrsn)));
The version is either zero (i.e. reality), or the users current version.
If the version is reality, then the current timestamp is < the version begin-timestamp (as
duplicated in the profile table).
318
Schema Design
There does not exist any row that "replaces" the current row (and that row has a current
timestamp that is <= to the profile (version) timestamp).
To make things easier for the users, we will create another view that sits on top of the above
view. This one only shows the business fields:
CREATE VIEW customer AS
SELECT cust#
,cust_name
,cust_mgr
FROM
customer_vw;
The following INSTEAD OF trigger traps all inserts to the first view above, and then applies
the insert to the underlying table - with suitable modifications:
CREATE TRIGGER customer_ins
INSTEAD OF
INSERT ON customer_vw
REFERENCING NEW AS nnn
FOR EACH ROW
MODE DB2SQL
INSERT INTO customer_his VALUES
(nnn.cust#
,nnn.cust_name
,nnn.cust_mgr
,CURRENT TIMESTAMP
,(SELECT vrsn
FROM
profile
WHERE user_id = USER)
,CASE
WHEN 0 < (SELECT COUNT(*)
FROM
customer
WHERE cust# = nnn.cust#)
THEN RAISE_ERROR(71001,ERROR: Duplicate cust#)
ELSE I
END
,USER
,NULL
,NULL
,NULL);
A check is done to see if the customer number is unique. One cannot use indexes to enforce such rules in this schema, so one has to code accordingly.
Retaining a Record
319
Graeme Birchall
Update Trigger
The following INSTEAD OF trigger traps all updates to the first view above, and turns them
into an insert to the underlying table - with suitable modifications:
CREATE TRIGGER customer_upd
INSTEAD OF
UPDATE ON customer_vw
REFERENCING NEW AS nnn
OLD AS ooo
FOR EACH ROW
MODE DB2SQL
INSERT INTO customer_his VALUES
(nnn.cust#
The following INSTEAD OF trigger traps all deletes to the first view above, and turns them
into an insert to the underlying table - with suitable modifications:
CREATE TRIGGER customer_del
INSTEAD OF
DELETE ON customer_vw
REFERENCING OLD AS ooo
FOR EACH ROW
MODE DB2SQL
INSERT INTO customer_his VALUES
(ooo.cust#
,ooo.cust_name
,ooo.cust_mgr
,CURRENT TIMESTAMP
,ooo.vrsn
,D
,ooo.user_id
,ooo.cust#
,ooo.cur_ts
,ooo.cur_vrsn);
The only thing that the user need see in the above schema in the simplified (second) view that
lists the business data columns. They would insert, update, and delete the rows in this view as
if they were working on a real table. Under the covers, the relevant INSTEAD OF trigger
would convert whatever they did into a suitable insert to the underlying table.
320
Schema Design
To do "what if" analysis, all one need do is insert a new row into the version table - with
a begin timestamp that is the current time. This insert creates a virtual copy of every table
in the application, which one can then update as desired.
To switch between versions, all one need do is update ones row in the profile table.
One can use recursive SQL (not shown here) to follow the sequence of changes to any
particular item, in any particular version.
Data items that are updated very frequently (e.g. customer daily balance) may perform
poorly when queried because many rows will have to be processed in order to find the
one that has not been replaced.
The views use the USER special register, which may not be unique per actual user.
Data integrity features, like referential integrity rules, cascading deletes, and unique key
checks, have to be hand-coded in the INSTEAD OF triggers.
Getting the triggers right is quite hard. If the target application has many tables, it might
be worthwhile to first create a suitable data-dictionary, and then write a script that generates as much of the code as possible.
Sample Code
See my website for more detailed sample code using the above schema.
Retaining a Record
321
Graeme Birchall
322
Schema Design
Reproducible.
Easy to make.
Similar to Production:
Select a single column/row entity, but do not use a table or view as the data source.
WITH temp1 (col1) AS
(VALUES
0
)
SELECT *
FROM
temp1;
ANSWER
======
COL1
---0
Select multiple rows and columns, but do not use a table or view as the data source.
WITH temp1 (col1, col2,
(VALUES
(
0, AA,
,(
1, BB,
,(
2, CC,
)
SELECT *
FROM
temp1;
col3) AS
0.00)
1.11)
2.22)
ANSWER
==============
COL1 COL2 COL3
---- ---- ---0 AA
0.00
1 BB
1.11
2 CC
2.22
323
Graeme Birchall
Create the set of integers between zero and one hundred. In this statement we shall use recursive coding to expand a single value into many more.
WITH temp1 (col1) AS
(VALUES
0
UNION ALL
SELECT col1 + 1
FROM
temp1
WHERE col1 + 1 < 100
)
SELECT *
FROM
temp1;
ANSWER
======
COL1
---0
1
2
3
etc
Create the complete set of integers between zero and one hundred. Display ten numbers in
each line of output.
WITH temp1 (c0,c1,c2,c3,c4,c5,c6,c7,c8,c9) AS
(VALUES
( 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
UNION ALL
SELECT c0+10, c1+10, c2+10, c3+10, c4+10
,c5+10, c6+10, c7+10, c8+10, c9+10
FROM
temp1
WHERE c0+10 < 100
)
SELECT *
FROM
temp1;
C1
---1
11
21
31
41
51
61
71
81
91
C2
---2
12
22
32
42
52
62
72
82
92
C3
---3
13
23
33
43
53
63
73
83
93
C4
---4
14
24
34
44
54
64
74
84
94
C5
---5
15
25
35
45
55
65
75
85
95
C6
---6
16
26
36
46
56
66
76
86
96
C7
---7
17
27
37
47
57
67
77
87
97
C8
---8
18
28
38
48
58
68
78
88
98
C9
---9
19
29
39
49
59
69
79
89
99
324
Create a report that shows the cosine of every angle between zero and ninety degrees (accurate to one tenth of a degree).
WITH temp1 (degree) AS
(VALUES SMALLINT(0)
UNION ALL
SELECT SMALLINT(degree + 1)
FROM
temp1
WHERE degree < 89
)
SELECT degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
,DEC(COS(RADIANS(degree
FROM
temp1;
+
+
+
+
+
+
+
+
+
+
0.0)),4,3)
0.1)),4,3)
0.2)),4,3)
0.3)),4,3)
0.4)),4,3)
0.5)),4,3)
0.6)),4,3)
0.7)),4,3)
0.8)),4,3)
0.9)),4,3)
AS
AS
AS
AS
AS
AS
AS
AS
AS
AS
point0
point1
point2
point3
point4
point5
point6
point7
point8
point9
So far, all we have done is create different sets of fixed data. These are usually not suitable
for testing purposes because they are too consistent. To mess things up a bit we need to use
the RAND function which generates random numbers in the range of zero to one inclusive. In
the next example we will get a (reproducible) list of five random numeric values:
325
Graeme Birchall
ANSWER
============
SEQ#
RAN1
---- ----0 0.001
1 0.563
2 0.193
3 0.808
4 0.585
With a bit of data manipulation, the GENERATE_UNIQUE function can be used (instead of
the RAND function) to make suitably random test data. The are advantages and disadvantages
to using both functions:
The GENERATE_UNIQUE function makes data that is always unique. The RAND function only outputs one of 32,000 distinct values.
The RAND function can make reproducible random data, while the GENERATE_UNIQUE function can not.
See the description of the GENERATE_UNIQUE function (see page 128) for an example of
how to use it to make random data.
Make Random Data - Different Ranges
There are several ways to mess around with the output from the RAND function: We can use
simple arithmetic to alter the range of numbers generated (e.g. convert from 0 to 10 to 0 to
10,000). We can alter the format (e.g. from FLOAT to DECIMAL). Lastly, we can make
fewer, or more, distinct random values (e.g. from 32K distinct values down to just 10). All of
this is done below:
WITH temp1 (s1, r1) AS
(VALUES (0, RAND(2))
UNION ALL
SELECT s1+1, RAND()
FROM
temp1
WHERE
s1+1 < 5
)
SELECT SMALLINT(s1)
,SMALLINT(r1*10000)
,DECIMAL(r1,6,4)
,SMALLINT(r1*10)
FROM
temp1;
AS
AS
AS
AS
seq#
ran2
ran1
ran3
ANSWER
========================
SEQ# RAN2 RAN1
RAN3
---- ---- ------ ---0
13 0.0013
0
1 8916 0.8916
8
2 7384 0.7384
7
3 5430 0.5430
5
4 8998 0.8998
8
The RAND function generates random numbers. To get random character data one has to
convert the RAND output into a character. There are several ways to do this. The first method
shown below uses the CHR function to convert a number in the range: 65 to 90 into the ASCII equivalent: "A" to "Z". The second method uses the CHAR function to translate a number
into the character equivalent.
326
ANSWER
===================
SEQ# RAN2 RAN3 RAN4
---- ---- ---- ---0
65 A
65
1
88 X
88
2
84 T
84
3
79 O
79
4
88 X
88
In the real world, there is a tendency for certain data values to show up much more frequently
than others. Likewise, separate fields in a table usually have independent semi-random data
distribution patterns. In the next statement we create four independently random fields. The
first has the usual 32K distinct values evenly distributed in the range of zero to one. The second is the same, except that it has many more distinct values (approximately 32K squared).
The third and fourth have random numbers that are skewed towards the low end of the range
with average values of 0.25 and 0.125 respectively.
WITH temp1 (s1,r1,r2,r3,r4) AS
ANSWER
(VALUES (0
==============================
,RAND(2)
S# RAN1
RAN2
RAN3
RAN4
,RAND()+(RAND()/1E5)
-- ------ ------ ------ -----,RAND()* RAND()
0
1373 169599 182618 215387
,RAND()* RAND()* RAND())
1 326700 445273 539604 357592
UNION ALL
2 909848 981267
7140 81553
SELECT s1 + 1
3 454573 577320 309318 166436
,RAND()
4 875942 257823 207873
9628
,RAND()+(RAND()/1E5)
,RAND()* RAND()
,RAND()* RAND()* RAND()
FROM
temp1
WHERE s1 + 1
< 5
)
SELECT SMALLINT(s1)
AS s#
,INTEGER(r1*1E6) AS ran1, INTEGER(r2*1E6) AS ran2
,INTEGER(r3*1E6) AS ran3, INTEGER(r4*1E6) AS ran4
FROM
temp1;
So far, all we have done in this chapter is use SQL to select sets of rows. Now we shall create
a Production-like table for performance testing purposes. We will then insert 10,000 rows of
suitably lifelike test data into the table. The DDL, with constraints and index definitions, follows. The important things to note are:
327
Graeme Birchall
328
FROM
SOCSEC#
----------484-10-9999
449-38-9998
979-90-9997
580-50-9993
264-87-9994
661-84-9995
554-53-9990
482-23-9991
536-41-9992
L_NME
--------Mimytmbi
Liiiemea
Zytaebma
Pimmeeat
Geimteei
Rbiybeet
Oiiaiaia
Mimtmamb
Nieebayt
The SOCSEC# field presented three problems: It had to be unique, it had to be random
with respect to the current employee number, and it is a character field with special layout constraints (see the DDL on page 328).
To make it random, the first five digits were defined using two of the temporary random
number fields. To try and ensure that it was unique, the last four digits contain part of the
employee number with some digit-flipping done to hide things. Also, the first random
number used is the one with lots of unique values. The special formatting that this field
required is addressed by making everything in pieces and then concatenating.
The JOB FUNCTION is determined using the fourth (highly skewed) random number.
This ensures that we get many more workers than managers.
The DEPT is derived from another, somewhat skewed, random number with a range of
values from one to ninety nine.
The SALARY is derived using the same, highly skewed, random number that was used
for the job function calculation. This ensures that theses two fields have related values.
The BIRTH DATE is a random date value somewhere between 1930 and 1981.
The FIRST NAME is derived using seven independent invocation of the CHR function,
each of which is going to give a somewhat different result.
The LAST NAME is (mostly) made by using the TRANSLATE function to convert a
large random number into a corresponding character value. The output is skewed towards
some of the vowels and the lower-range characters during the translation.
329
Graeme Birchall
Time-Series Processing
The following table holds data for a typical time-series application. Observe is that each row
has both a beginning and ending date, and that there are three cases where there is a gap between the end-date of one row and the begin-date of the next (with the same key).
CREATE TABLE time_series
(KYY
CHAR(03)
NOT NULL
,bgn_dt
DATE
NOT NULL
,end_dt
DATE
NOT NULL
,CONSTRAINT tsx1 PRIMARY KEY(kyy,bgn_dt)
,CONSTRAINT tsc1 CHECK (kyy <> )
,CONSTRAINT tsc2 CHECK (bgn_dt <= end_dt));
COMMIT;
INSERT INTO TIME_series values
(AAA,1995-10-01,1995-10-04),
(AAA,1995-10-06,1995-10-06),
(AAA,1995-10-07,1995-10-07),
(AAA,1995-10-15,1995-10-19),
(BBB,1995-10-01,1995-10-01),
(BBB,1995-10-03,1995-10-03);
We want to find any cases where the begin-to-end date range of one row overlaps another
with the same key value. In our test database, this query will return no rows.
The following diagram illustrates what we are trying to find. The row at the top (shown as a
bold line) is overlapped by each of the four lower rows, but the nature of the overlap differs in
each case.
time
ROW
ROW
ROW
ROW
ROW
The relevant SQL follows. When reading it, think of the "A" table as being the double line
above and "B" table as being the four overlapping rows shown as single lines.
SELECT kyy
,bgn_dt
,end_dt
FROM
time_series a
WHERE EXISTS
(SELECT *
FROM
time_series b
WHERE a.kyy
= b.kyy
AND a.bgn_dt <> b.bgn_dt
AND (a.bgn_dt BETWEEN b.bgn_dt AND b.end_dt
OR b.bgn_dt BETWEEN a.bgn_dt AND a.end_dt))
ORDER BY 1,2;
ANSWER
=========
<no rows>
330
Time-Series Processing
The first predicate in the above sub-query joins the rows together by matching key value. The
second predicate makes sure that one row does not match against itself. The final two predicates look for overlapping date ranges.
The above query relies on the sample table data being valid (as defined by the CHECK constraints in the DDL on page 330. This means that the END_DT is always greater than or equal
to the BGN_DT, and each KYY, BGN_DT combination is unique.
Find Gaps in Time-Series
We want to find all those cases in the TIME_SERIES table when the ending of one row is not
exactly one day less than the beginning of the next (if there is a next). The following query
will answer this question. It consists of both a join and a sub-query. In the join (which is done
first), we match each row with every other row that has the same key and a BGN_DT that is
more than one day greater than the current END_DT. Next, the sub-query excludes from the
result those join-rows where there is an intermediate third row.
SELECT a.kyy
,a.bgn_dt
,a.end_dt
,b.bgn_dt
,b.end_dt
,DAYS(b.bgn_dt) DAYS(A.end_dt)
as diff
FROM
time_series a
,time_series b
WHERE a.kyy
= b.kyy
AND a.end_dt < b.bgn_dt - 1 DAY
AND NOT EXISTS
(SELECT *
FROM
time_series z
WHERE z.kyy
= a.kyy
AND z.kyy
= b.kyy
AND z.bgn_dt > a.bgn_dt
AND z.bgn_dt < b.bgn_dt)
ORDER BY 1,2;
TIME_SERIES
+-------------------------+
|KYY|BGN_DT
|END_DT
|
|---|----------|----------|
|AAA|1995-10-01|1995-10-04|
|AAA|1995-10-06|1995-10-06|
|AAA|1995-10-07|1995-10-07|
|AAA|1995-10-15|1995-10-19|
|BBB|1995-10-01|1995-10-01|
|BBB|1995-10-03|1995-10-03|
+-------------------------+
BGN_DT
---------1995-10-01
1995-10-07
1995-10-01
END_DT
---------1995-10-04
1995-10-07
1995-10-01
BGN_DT
---------1995-10-06
1995-10-15
1995-10-03
END_DT
---------1995-10-06
1995-10-19
1995-10-03
DIFF
---2
8
2
Instead of looking at those rows that encompass a gap in the data, we may want to look at the
actual gap itself. To this end, the following SQL differs from the prior in that the SELECT list
has been modified to get the start, end, and duration, of each gap.
331
Graeme Birchall
SELECT a.kyy
AS kyy
,a.end_dt + 1 DAY
AS bgn_gap
,b.bgn_dt - 1 DAY
AS end_gap
,(DAYS(b.bgn_dt) DAYS(a.end_dt) - 1) AS sz
FROM
time_series a
,time_series b
WHERE a.kyy
= b.kyy
AND a.end_dt < b.bgn_dt - 1 DAY
AND NOT EXISTS
(SELECT *
FROM
time_series z
WHERE z.kyy
= a.kyy
AND z.kyy
= b.kyy
AND z.bgn_dt > a.bgn_dt
AND z.bgn_dt < b.bgn_dt)
ORDER BY 1,2;
TIME_SERIES
+-------------------------+
|KYY|BGN_DT
|END_DT
|
|---|----------|----------|
|AAA|1995-10-01|1995-10-04|
|AAA|1995-10-06|1995-10-06|
|AAA|1995-10-07|1995-10-07|
|AAA|1995-10-15|1995-10-19|
|BBB|1995-10-01|1995-10-01|
|BBB|1995-10-03|1995-10-03|
+-------------------------+
ANSWER
============================
KYY BGN_GAP
END_GAP
SZ
--- ---------- ---------- -AAA 1995-10-05 1995-10-05 1
AAA 1995-10-08 1995-10-14 7
BBB 1995-10-02 1995-10-02 1
Imagine that we wanted to see each individual day in a gap. The following statement does this
by taking the result obtained above and passing it into a recursive SQL statement which then
generates additional rows - one for each day in the gap after the first.
WITH temp
(kyy, gap_dt, gsize) AS
(SELECT a.kyy
,a.end_dt + 1 DAY
,(DAYS(b.bgn_dt) DAYS(a.end_dt) - 1)
FROM
time_series a
,time_series b
WHERE a.kyy
= b.kyy
AND a.end_dt < b.bgn_dt - 1 DAY
AND NOT EXISTS
(SELECT *
FROM
time_series z
WHERE z.kyy
= a.kyy
AND z.kyy
= b.kyy
AND z.bgn_dt > a.bgn_dt
AND z.bgn_dt < b.bgn_dt)
UNION ALL
SELECT kyy
,gap_dt + 1 DAY
,gsize - 1
FROM
temp
WHERE gsize > 1
)
SELECT
*
FROM
temp
ORDER BY 1,2;
TIME_SERIES
+-------------------------+
|KYY|BGN_DT
|END_DT
|
|---|----------|----------|
|AAA|1995-10-01|1995-10-04|
|AAA|1995-10-06|1995-10-06|
|AAA|1995-10-07|1995-10-07|
|AAA|1995-10-15|1995-10-19|
|BBB|1995-10-01|1995-10-01|
|BBB|1995-10-03|1995-10-03|
+-------------------------+
ANSWER
=======================
KEYCOL GAP_DT
GSIZE
------ ---------- ----AAA
1995-10-05
1
AAA
1995-10-08
7
AAA
1995-10-09
6
AAA
1995-10-10
5
AAA
1995-10-11
4
AAA
1995-10-12
3
AAA
1995-10-13
2
AAA
1995-10-14
1
BBB
1995-10-02
1
One can use the TABLESAMPLE schema to randomly sample rows for subsequent analysis.
332
table name
correrelation name
TABLESAMPLE
BERNOULLI
(percent)
SYSTEM
REPEATABLE
(num)
The table-name must refer to a real table. This can include a declared global temporary
table, or a materialized query table. It cannot be a nested table expression.
The sampling is an addition to any predicates specified in the where clause. Under the
covers, sampling occurs before any other query processing, such as applying predicates
or doing a join.
The SYSTEM option lets DB2 find the most efficient way to sample the data. This may
mean that all rows on each page that qualifies are included. For small tables, this method
often results in an misleading percentage of rows selected.
The "percent" number must be equal to or less than 100, and greater than zero. It determines what percentage of the rows processed are returns.
The REPEATABLE option and number is used if one wants to get the same result every
time the query is run (assuming no data changes). Without this option, each run will be
both random and different.
Examples
Sample 5% of the rows in the staff table. Get the same result each time:
SELECT
*
FROM
staff TABLESAMPLE BERNOULLI(5) REPEATABLE(1234)
ORDER BY id;
*
employee ee TABLESAMPLE BERNOULLI(18)
,emp_act ea TABLESAMPLE BERNOULLI(25)
WHERE
ee.empno = ea.empno
ORDER BY ee.empno;
*
session.nyc_staff TABLESAMPLE SYSTEM(34.55)
id
< 100
salary > 100
id;
333
Graeme Birchall
The DOUBLE, DECIMAL, INTEGER, SMALLINT, and BIGINT functions call all be used
to convert a character field into its numeric equivalent:
WITH temp1 (c1) AS
(VALUES 123 , 345 , 567)
SELECT c1
,DOUBLE(c1)
AS dbl
,DECIMAL(c1,3) AS dec
,SMALLINT(c1) AS sml
,INTEGER(c1)
AS int
FROM
temp1;
COMPATIBLE FUNCTIONS
==========================================
DOUBLE, DECIMAL, INTEGER, SMALLINT, BIGINT
DOUBLE, DECIMAL
DOUBLE
There are several ways to check that the input character string is a valid representation of a
number - before doing the conversion. One simple solution involves converting all digits to
blank, then removing the blanks. If the result is not a zero length string, then the input must
have had a character other than a digit:
WITH temp1 (c1) AS (VALUES 123,456 , 1 2, 33%,NULL)
SELECT c1
,TRANSLATE(c1,
,1234567890)
AS c2
,LENGTH(LTRIM(TRANSLATE(c1,
,1234567890))) AS c3
FROM
temp1;
ANSWER
============
C1
C2
C3
---- ---- -123
0
456
0
1 2
0
33%
% 1
-
The only blanks in the input are to the left of the digits.
There is only one "+" or "-" sign, and it is next to the left-side blanks, if any.
334
--#SET DELIMITER !
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
335
Graeme Birchall
IF SUBSTR(instr,ctr,1) = - THEN
SET found_neg = Y;
END IF;
IF SUBSTR(instr,ctr,1) <> THEN
SET bgn_blank = N;
END IF;
SET ctr = ctr + 1;
END WHILE wloop;
IF found_num = N THEN
SET is_number = N;
END IF;
RETURN is_number;
END!
WITH TEMP1 (C1) AS
(VALUES
123
,+123.45
,456
, 10 2
,
-.23
,++12356
,.012349
,
33%
,
,NULL)
SELECT
C1
,isnumeric(C1)
,CASE
WHEN isnumeric(C1) = Y
THEN DECIMAL(C1,10,6)
ELSE NULL
END
FROM
TEMP1!
AS C1
AS C2
AS C3
ANSWER
====================
C1
C2 C3
------- -- --------123 Y 123.00000
+123.45 Y 123.45000
456
N
10 2
N
-.23 Y
-0.23000
++12356 N
.012349 Y
0.01234
33% N
N
-
The CHAR and DIGITS functions can be used to convert a DB2 numeric field to a character
representation of the same, but as the following example demonstrates, both functions return
problematic output:
SELECT
d_sal
,CHAR(d_sal)
AS d_chr
,DIGITS(d_sal) AS d_dgt
,i_sal
,CHAR(i_sal)
AS i_chr
,DIGITS(i_sal) AS i_dgt
FROM
(SELECT DEC(salary - 11000,6,2) AS d_sal
,SMALLINT(salary - 11000) AS i_sal
FROM
staff
WHERE
salary > 10000
AND
salary < 12200
)AS xxx
ANSWER
ORDER BY d_sal;
=========================================
D_SAL
D_CHR
D_DGT I_SAL I_CHR I_DGT
------- -------- ------ ----- ----- -----494.10 -0494.10 049410 -494 -494 00494
-12.00 -0012.00 001200
-12 -12
00012
508.60 0508.60 050860
508 508
00508
1009.75 1009.75 100975 1009 1009 01009
336
Below are three user-defined functions that convert integer data from numeric to character,
displaying the output right-justified, and with a sign indicator if negative. There is one function for each flavor of integer that is supported in DB2:
CREATE FUNCTION CHAR_RIGHT(inval SMALLINT)
RETURNS CHAR(06)
RETURN RIGHT(CHAR(,06) CONCAT RTRIM(CHAR(inval)),06);
CREATE FUNCTION CHAR_RIGHT(inval INTEGER)
RETURNS CHAR(11)
RETURN RIGHT(CHAR(,11) CONCAT RTRIM(CHAR(inval)),11);
CREATE FUNCTION CHAR_RIGHT(inval BIGINT)
RETURNS CHAR(20)
RETURN RIGHT(CHAR(,20) CONCAT RTRIM(CHAR(inval)),20);
First, convert the input number to character using the CHAR function.
Then, concatenate a set number of blanks to the left of the value. The number of blanks
appended depends upon the input type, which is why there are three separate functions.
Finally, use the RIGHT function to get the right-most "n" characters, where "n" is the
maximum number of digits (plus the sign indicator) supported by the input type.
i_sal
,CHAR_RIGHT(i_sal) AS i_chr
FROM
(SELECT SMALLINT(salary - 11000) AS i_sal
FROM
staff
WHERE
salary > 10000
AND
salary < 12200
)AS xxx
ORDER BY i_sal;
ANSWER
===========
I_SAL I_CHR
----- -----494 -494
-12
-12
508
508
1009 1009
Creating a similar function to handle decimal input is a little more complex. One problem is
that the CHAR function adds zeros to decimal data, which we dont want. But a more serious
problem is that there are many sizes and scales of decimal input, but we can only make one
function (with a given name) that must support all possible lengths and scales. This is impossible, so we will have to comprise as best we can.
Imagine that we have two decimal fields, one of which has a length and scale of (31,0), while
the other has a length and scale of (31,31). We cannot create a single function that will handle
both input types without either possibly running out of digits (in the first case), or loosing
some precision (in the second case).
NOTE: The fact that one can only have one user-defined function, with a given name, per
DB2 data type, presents a problem for all variable-length data types - notably character,
varchar, and decimal. For character and varchar data, one can address the problem, to
some extent, by using maximum length input and output fields. But decimal data has both
a scale and a length, so there is no way to make an all-purpose decimal function.
337
Graeme Birchall
Despite all the above, below is a function that converts decimal data to character. It compromises by assuming an input of type decimal(31,12), which should work in most situations:
CREATE FUNCTION CHAR_RIGHT(inval DECIMAL(31,12))
RETURNS CHAR(33)
RETURN CHAR_RIGHT(BIGINT(inval))
CONCAT .
CONCAT SUBSTR(DIGITS(inval - TRUNCATE(inval,0)),20,12);
First, convert the input number to integer using the standard BIGINT function.
Next, use the previously defined CHAR_RIGHT user-function to convert the BIGINT
data to a right-justified character value.
Finally append the digits (converted to character using the standard DIGITS function)
that represent the decimal component of the input.
d_sal
,CHAR_RIGHT(d_sal)
AS d_chr
FROM
(SELECT DEC(salary - 11000,6,2)
FROM
staff
WHERE
salary > 10000
AND
salary < 12200
)AS xxx
ORDER BY d_sal;
AS d_sal
ANSWER
=========================
D_SAL
D_CHR
------- -----------------494.10 -494.100000000000
-12.00 -12.000000000000
508.60 508.600000000000
1009.75 1009.750000000000
There is absolutely no sane reason why anyone would want to convert a date, time, or timestamp value directly to a number. The only correct way to manipulate such data is to use the
provided date/time functions. But having said that, here is how one does it:
WITH tab1(ts1) AS
(VALUES CAST(1998-11-22-03.44.55.123456 AS TIMESTAMP))
SELECT
FROM
ts1
,
HEX(ts1)
,
DEC(HEX(ts1),20)
,FLOAT(DEC(HEX(ts1),20))
,REAL (DEC(HEX(ts1),20))
tab1;
=>
=>
=>
=>
=>
1998-11-22-03.44.55.123456
19981122034455123456
19981122034455123456.
1.99811220344551e+019
1.998112e+019
There is no way in static SQL to vary the number of columns returned by a select statement.
In order to change the number of columns you have to write a new SQL statement and then
rebind. But one can use CASE logic to control whether or not a column returns any data.
338
Imagine that you are forced to use static SQL. Furthermore, imagine that you do not always
want to retrieve the data from all columns, and that you also do not want to transmit data over
the network that you do not need. For character columns, we can address this problem by retrieving the data only if it is wanted, and otherwise returning to a zero-length string. To illustrate, here is an ordinary SQL statement:
SELECT
empno
,firstnme
,lastname
,job
FROM
employee
WHERE
empno < 000100
ORDER BY empno;
empno
,CASE :host-var-1
WHEN 1 THEN firstnme
ELSE
END
AS firstnme
,CASE :host-var-2
WHEN 1 THEN lastname
ELSE
END
AS lastname
,CASE :host-var-3
WHEN 1 THEN VARCHAR(job)
ELSE
END
AS job
FROM
employee
WHERE
empno < 000100
ORDER BY empno;
Imagine that one had a string of numeric values that one wants to display as a line-bar chart.
With a little coding, this is easy to do in SQL:
SELECT
id
,salary
,INT(salary / 1500)
AS len
,REPEAT(*,INT(salary / 1500)) AS salary_chart
FROM
staff
WHERE
id > 120
ANSWER
AND
id < 190
===================================
ORDER BY id;
ID
SALARY
LEN SALARY_CHART
--- -------- --- --------------130 10505.90
7 *******
140 21150.00
14 **************
150 19456.50
12 ************
160 22959.20
15 ***************
170 12258.50
8 ********
180 12009.75
8 ********
339
Graeme Birchall
One problem with the above query is that we wont know how long the chart will be until we
run the statement. This may cause problems if we guess wrongly and we are tight for space,
so the next query addresses this issue by creating a chart of known length. To do this, it does
the following:
First select all of the matching rows and columns and store them in a temporary table.
Next, obtain the MAX value from the field of interest. Then covert this value to an integer and divide by the maximum desired chart length (e.g. 20).
Finally, join the two temporary tables together and display the chart. Because the chart
will never be longer than 20 bytes, we can display it in a 20 byte field.
WITH
temp1 (id, salary) AS
(SELECT
id
,salary
FROM
staff
WHERE
id > 120
AND
id < 190),
temp2 (max_sal) AS
(SELECT
INT(MAX(salary)) / 20
FROM
temp1)
SELECT
id
,salary
,VARCHAR(REPEAT(*,INT(salary / max_sal)),20) AS salary_chart
FROM
temp1
,temp2
ORDER BY id;
The STATS table that is defined on page 116 has a SEX field with just two values, F (for
female) and M (for male). To get a count of the rows by sex we can write the following:
SELECT
sex
,COUNT(*) AS num
FROM
stats
GROUP BY sex
ORDER BY sex;
ANSWER >>
SEX
--F
M
NUM
--595
405
340
SELECT
FROM
COUNT(*) AS total
,SUM(CASE sex WHEN F THEN 1 ELSE 0 END) AS female
,SUM(CASE sex WHEN M THEN 1 ELSE 0 END) AS male
stats;
Imagine that we want to select from the EMPLOYEE table the following counts presented in
a tabular list with one line per item. In each case, if nothing matches we want to get a zero:
Note that a given row in the EMPLOYEE table may match more than one of the above criteria. If this were not the case, a simple nested table expression could be used. Instead we will
do the following:
341
Graeme Birchall
SUBCATEGORY/DEPT
----------------------------ROWS IN TABLE
SALARY > $20K
NAME LIKE ABC%
NUMBER MALES
ADMINISTRATION SYSTEMS
DEVELOPMENT CENTER
INFORMATION CENTER
MANUFACTURING SYSTEMS
OPERATIONS
PLANNING
SOFTWARE SUPPORT
SPIFFY COMPUTER SERVICE DIV.
SUPPORT SERVICES
#ROWS
----32
25
0
19
6
0
3
9
5
1
4
3
1
One often has a sequence of values (e.g. invoice numbers) from which one needs both found
and not-found rows. This cannot be done using a simple SELECT statement because some of
rows being selected may not actually exist. For example, the following query lists the number
342
of staff that have worked for the firm for "n" years, but it misses those years during which no
staff joined:
SELECT
years
,COUNT(*) AS #staff
FROM
staff
WHERE
UCASE(name) LIKE %E%
AND
years
<= 5
GROUP BY years;
ANSWER
=============
YEARS #STAFF
----- -----1
1
4
2
5
3
ANSWER
============
YEARS #STAFF
----- -----0
0
1
1
2
0
3
0
4
2
5
3
ANSWER
============
YEARS #STAFF
----- -----0
0
1
1
2
0
3
0
4
2
5
3
343
Graeme Birchall
ANSWER
======
YEAR#
----0
2
3
Imagine that one has a string of text that one wants to break up into individual words. As long
as the word delimiter is fairly basic (e.g. a blank space), one can use recursive SQL to do this
task. One recursively divides the text into two parts (working from left to right). The first part
is the word found, and the second part is the remainder of the text:
WITH
temp1 (id, data) AS
(VALUES (01,SOME TEXT TO PARSE.)
,(02,MORE SAMPLE TEXT.)
,(03,ONE-WORD.)
,(04,)
),
temp2 (id, word#, word, data_left) AS
(SELECT id
,SMALLINT(1)
,SUBSTR(data,1,
CASE LOCATE( ,data)
WHEN 0 THEN LENGTH(data)
ELSE
LOCATE( ,data)
END)
,LTRIM(SUBSTR(data,
CASE LOCATE( ,data)
WHEN 0 THEN LENGTH(data) + 1
ELSE
LOCATE( ,data)
END))
FROM
temp1
WHERE
data <>
UNION ALL
SELECT id
,word# + 1
,SUBSTR(data_left,1,
CASE LOCATE( ,data_left)
WHEN 0 THEN LENGTH(data_left)
ELSE
LOCATE( ,data_left)
END)
,LTRIM(SUBSTR(data_left,
CASE LOCATE( ,data_left)
WHEN 0 THEN LENGTH(data_left) + 1
ELSE
LOCATE( ,data_left)
END))
FROM
temp2
WHERE
data_left <>
)
SELECT
*
FROM
temp2
ORDER BY 1,2;
344
The SUBSTR function is used above to extract both the next word in the string, and the remainder of the text. If there is a blank byte in the string, the SUBSTR stops (or begins, when
getting the remainder) at it. If not, it goes to (or begins at) the end of the string. CASE logic is
used to decide what to do.
ID
-1
1
1
1
2
2
2
3
WORD#
----1
2
3
4
1
2
3
1
WORD
--------SOME
TEXT
TO
PARSE.
MORE
SAMPLE
TEXT.
ONE-WORD.
DATA_LEFT
-------------TEXT TO PARSE.
TO PARSE.
PARSE.
SAMPLE TEXT.
TEXT.
In the next example, we shall use recursion to string together all of the employee NAME
fields in the STAFF table (by department):
WITH temp1 (dept,w#,name,all_names) AS
(SELECT
dept
,SMALLINT(1)
,MIN(name)
,VARCHAR(MIN(name),50)
FROM
staff a
GROUP BY dept
UNION ALL
SELECT
a.dept
,SMALLINT(b.w#+1)
,a.name
,b.all_names || || a.name
FROM
staff a
,temp1 b
WHERE
a.dept = b.dept
AND
a.name > b.name
AND
a.name =
(SELECT MIN(c.name)
FROM
staff c
WHERE c.dept = b.dept
AND c.name > b.name)
)
SELECT
dept
,w#
,name AS max_name
,all_names
FROM
temp1 d
WHERE
w# =
(SELECT MAX(w#)
FROM
temp1 e
WHERE d.dept = e.dept)
ORDER BY dept;
345
Graeme Birchall
DEPT W#
---- -10 4
15 4
20 4
38 5
42 4
51 5
66 5
84 4
MAX_NAME
--------Molinare
Rothman
Sneider
Quigley
Yamaguchi
Williams
Wilson
Quill
ALL_NAMES
------------------------------------------Daniels Jones Lu Molinare
Hanes Kermisch Ngan Rothman
James Pernal Sanders Sneider
Abrahams Marenghi Naughton OBrien Quigley
Koonitz Plotz Scoutten Yamaguchi
Fraye Lundquist Smith Wheeler Williams
Burke Gonzales Graham Lea Wilson
Davis Edwards Gafney Quill
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
SELECT
dept
AS DEPT
,SMALLINT(cnt)
AS W#
,mxx
AS MAX_NAME
,list_names(dept) AS ALL_NAMES
FROM
(SELECT
dept
,COUNT(*) as cnt
,MAX(name) AS mxx
FROM
staff
GROUP BY dept
)as ddd
ORDER BY dept!
DB2 lacks a simple function for reversing the contents of a data field. Fortunately, we can
create a function to do it ourselves.
Input vs. Output
Before we do any data reversing, we have to define what the reversed output should look like
relative to a given input value. For example, if we have a four-digit numeric field, the reverse
of the number 123 could be 321, or it could be 3210. The latter value implies that the input
has a leading zero. It also assumes that we really are working with a four digit field. Likewise,
the reverse of the number 123.45 might be 54.321, or 543.21.
346
Another interesting problem involves reversing negative numbers. If the value "-123" is a
string, then the reverse is probably "321-". If it is a number, then the desired reverse is more
likely to be "-321".
Trailing blanks in character strings are a similar problem. Obviously, the reverse of "ABC" is
"CBA", but what is the reverse of "ABC "? There is no general technical answer to any of
these questions. The correct answer depends upon the business needs of the application.
Below is a user defined function that can reverse the contents of a character field:
--#SET DELIMITER !
IMPORTANT
============
This example
uses an "!"
as the stmt
delimiter.
id
AS ID
,salary
AS SALARY1
,DEC(reverse(CHAR(salary)),7,4) AS SALARY2
FROM
staff
ANSWER
WHERE
id < 40
===================
ORDER BY id;
ID SALARY1 SALARY2
-- -------- ------10 18357.50 5.7538
20 18171.25 52.1718
30 17506.75 57.6057
If all you want to do is remove leading and trailing blanks, the LTRIM and RTRIM functions
can be combined to do the job:
WITH temp (txt) AS
(VALUES (
HAS LEADING BLANKS)
,(HAS TRAILING BLANKS )
,( BLANKS BOTH ENDS
))
SELECT LTRIM(RTRIM(txt))
AS txt2
,LENGTH(LTRIM(RTRIM(txt))) AS len
FROM
temp;
ANSWER
=======================
TXT2
LEN
------------------- --HAS LEADING BLANKS
18
HAS TRAILING BLANKS 19
BLANKS BOTH ENDS
16
347
Graeme Birchall
Stripping leading and trailing non-blank characters is a little harder, and is best done by writing your own function. The following example goes thus:
Check that a one-byte strip value was provided. Signal an error if not.
Starting from the left, scan the input string one byte at a time, looking for the character to
be stripped. Stop scanning when something else is found.
Use the SUBSTR function to trim the input-string - up to the first non-target value found.
Starting from the right, scan the left-stripped input string one byte at a time, looking for
the character to be stripped. Stop scanning when something else is found.
Use the SUBSTR function to trim the right side of the already left-trimmed input string.
348
ANSWER
========================
W# WORD_VAL
STP
LEN
-- ---------- ------ --1 00 abc 000 abc
5
2 0 0 abc
0 abc
6
3 sdbs
sdbs
5
4 000 0
1
5 0000
0
6 0
0
7 a
a
1
8
0
The following user-defined scalar function will sort the contents of a character field in either
ascending or descending order. There are two input parameters:
The input string: As written, the input can be up to 20 bytes long. To sort longer fields,
change the input, output, and OUT-VAL (variable) lengths as desired.
The function uses a very simple, and not very efficient, bubble-sort. In other words, the input
string is scanned from left to right, comparing two adjacent characters at a time. If they are
not in sequence, they are swapped - and flag indicating this is set on. The scans are repeated
until all of the characters in the string are in order:
349
Graeme Birchall
--#SET DELIMITER !
CREATE FUNCTION sort_char(in_val VARCHAR(20),sort_dir VARCHAR(1))
RETURNS VARCHAR(20)
BEGIN ATOMIC
DECLARE cur_pos SMALLINT;
DECLARE do_sort CHAR(1);
DECLARE out_val VARCHAR(20);
IF UCASE(sort_dir) NOT IN (A,D) THEN
SIGNAL SQLSTATE 75001
SET MESSAGE_TEXT = Sort order not A or D;
END IF;
SET out_val = in_val;
SET do_sort = Y;
WHILE do_sort = Y DO
SET do_sort = N;
IMPORTANT
SET cur_pos = 1;
============
WHILE cur_pos < length(in_val) DO
This example
IF (UCASE(sort_dir)
= A
uses an "!"
AND SUBSTR(out_val,cur_pos+1,1) <
as the stmt
SUBSTR(out_val,cur_pos,1))
delimiter.
OR (UCASE(sort_dir)
= D
AND SUBSTR(out_val,cur_pos+1,1) >
SUBSTR(out_val,cur_pos,1)) THEN
SET do_sort = Y;
SET out_val = CASE
WHEN cur_pos = 1
THEN
ELSE SUBSTR(out_val,1,cur_pos-1)
END
CONCAT SUBSTR(out_val,cur_pos+1,1)
CONCAT SUBSTR(out_val,cur_pos ,1)
CONCAT
CASE
WHEN cur_pos = length(in_val) - 1
THEN
ELSE SUBSTR(out_val,cur_pos+2)
END;
END IF;
SET cur_pos = cur_pos + 1;
END WHILE;
END WHILE;
RETURN out_val;
END!
ANSWER
=============================
W# WORD_VAL SA
SD
-- --------- ------- -------1 12345678 12345678 87654321
2 ABCDEFG ABCDEFG GFEDCBA
3 AaBbCc
aAbBcC
CcBbAa
4 abccb
abbcc
ccbba
5 %#.
.#%
%#.
6 bB
bB
Bb
7 a
a
a
8
350
Imagine that one wanted some query to take exactly four seconds to run. The following query
does just this - by looping (using recursion) until such time as the current system timestamp is
four seconds greater than the system timestamp obtained at the beginning of the query:
WITH temp1 (num,ts1,ts2) AS
(VALUES (INT(1)
,TIMESTAMP(GENERATE_UNIQUE())
,TIMESTAMP(GENERATE_UNIQUE()))
UNION ALL
SELECT num + 1
,ts1
,TIMESTAMP(GENERATE_UNIQUE())
FROM
temp1
WHERE
TIMESTAMPDIFF(2,CHAR(ts2-ts1)) < 4
)
SELECT MAX(num) AS #loops
,MIN(ts2) AS bgn_timestamp
,MAX(ts2) AS end_timestamp
FROM
temp1;
ANSWER
============================================================
#LOOPS BGN_TIMESTAMP
END_TIMESTAMP
------ -------------------------- -------------------------58327 2001-08-09-22.58.12.754579 2001-08-09-22.58.16.754634
We can take the above query and convert it into a user-defined function that will loop for "n"
seconds, where "n" is the value passed to the function. However, there are several caveats:
Looping in SQL is a "really stupid" way to hang around for a couple of seconds. A far
better solution would be to call a stored procedure written in an external language that
has a true pause command.
The number of times that the function is invoked may differ, depending on the access
path used to run the query.
The recursive looping is going to result in the calling query getting a warning message.
351
Graeme Birchall
FROM
WHERE
id
,SUBSTR(CHAR(TIMESTAMP(GENERATE_UNIQUE())),18) AS ss_mmmmmm
,pause(id / 10)
AS #loops
,SUBSTR(CHAR(TIMESTAMP(GENERATE_UNIQUE())),18) AS ss_mmmmmm
staff
id < 31;
ANSWER
=============================
ID SS_MMMMMM #LOOPS SS_MMMMMM
-- --------- ------ --------10 50.068593 76386 50.068587
20 52.068744 144089 52.068737
30 55.068930 206101 55.068923
The median is defined at that value in a series of values where half of the values are higher to
it and the other half are lower. The median is a useful number to get when the data has a few
very extreme values that skew the average.
If there are an odd number of values in the list, then the median value is the one in the middle
(e.g. if 7 values, the median value is #4). If there is an even number of matching values, there
are two formulas that one can use:
The most commonly used definition is that the median equals the sum of the two middle
values, divided by two.
A less often used definition is that the median is the smaller of the two middle values.
DB2 does not come with a function for calculating the median, but it can be obtained using
the ROW_NUMBER function. This function is used to assign a row number to every matching row, and then one searches for the row with the middle row number.
Using Formula #1
Below is some sample code that gets the median SALARY, by JOB, for some set of rows in
the STAFF table. Two JOB values are referenced - one with seven matching rows, and one
with four. The query logic goes as follows:
352
Get the matching set of rows from the STAFF table, and give each row a row-number,
within each JOB value.
Using the set of rows retrieved above, get the maximum row-number, per JOB value,
then add 1.0, then divide by 2, then add or subtract 0.6. This will give one two values that
encompass a single row-number, if an odd number of rows match, and two row-numbers,
if an even number of rows match.
Finally, join the one row per JOB obtained in step 2 above to the set of rows retrieved in
step 1 - by common JOB value, and where the row-number is within the high/low range.
The average salary of whatever is retrieved is the median.
The next example is the essentially the same as the prior, but there is additional code that gets
the average SALARY, and a count of the number of matching rows per JOB value. Observe
that all this extra code went in the second step:
353
Graeme Birchall
WITH numbered_rows AS
(SELECT
s.*
,ROW_NUMBER() OVER(PARTITION BY job
ORDER
BY salary, id) AS row#
FROM
staff s
WHERE
comm
> 0
AND
name LIKE %e%),
median_row_num AS
(SELECT
job
,(MAX(row# + 1.0) / 2) - 0.5 AS med_lo
,(MAX(row# + 1.0) / 2) + 0.5 AS med_hi
,DEC(AVG(salary),7,2)
AS avg_sal
,COUNT(*)
AS #rows
FROM
numbered_rows
GROUP BY job)
SELECT
nn.job
,DEC(AVG(nn.salary),7,2) AS med_sal
,MAX(mr.avg_sal)
AS avg_sal
,MAX(mr.#rows)
AS #r
FROM
numbered_rows
nn
,median_row_num mr
ANSWER
WHERE
nn.job
= mr.job
==========================
AND
nn.row# BETWEEN mr.med_lo
JOB
MED_SAL AVG_SAL #R
AND mr.med_hi
----- -------- -------- -GROUP BY nn.job
Clerk 13030.50 12857.56 7
ORDER BY nn.job;
Sales 17432.10 17460.93 4
Once again, the following sample code gets the median SALARY, by JOB, for some set of
rows in the STAFF table. Two JOB values are referenced - one with seven matching rows,
and the other with four. In this case, when there are an even number of matching rows, the
smaller of the two middle values is chosen. The logic goes as follows:
Get the matching set of rows from the STAFF table, and give each row a row-number,
within each JOB value.
Using the set of rows retrieved above, get the maximum row-number per JOB, then add
1, then divide by 2. This will get the row-number for the row with the median value.
Finally, join the one row per JOB obtained in step 2 above to the set of rows retrieved in
step 1 - by common JOB and row-number value.
WITH numbered_rows AS
(SELECT
s.*
,ROW_NUMBER() OVER(PARTITION BY job
ORDER
BY salary, id) AS row#
FROM
staff s
WHERE
comm
> 0
AND
name LIKE %e%),
median_row_num AS
(SELECT
job
,MAX(row# + 1) / 2 AS med_row#
FROM
numbered_rows
GROUP BY job)
SELECT
nn.job
,nn.salary AS med_sal
ANSWER
FROM
numbered_rows
nn
==============
,median_row_num mr
JOB
MED_SAL
WHERE
nn.job = mr.job
----- -------AND
nn.row# = mr.med_row#
Clerk 13030.50
ORDER BY nn.job;
Sales 16858.20
354
The next query is the same as the prior, but it uses a sub-query, instead of creating and then
joining to a second temporary table:
WITH numbered_rows AS
(SELECT
s.*
,ROW_NUMBER() OVER(PARTITION BY job
ORDER
BY salary, id) AS row#
FROM
staff s
WHERE
comm
> 0
AND
name LIKE %e%)
SELECT
job
,salary AS med_sal
FROM
numbered_rows
WHERE
(job,row#) IN
ANSWER
(SELECT
job
==============
,MAX(row# + 1) / 2
JOB
MED_SAL
FROM
numbered_rows
----- -------GROUP BY job)
Clerk 13030.50
ORDER BY job;
Sales 16858.20
355
Graeme Birchall
356
Quirks in SQL
One might have noticed by now that not all SQL statements are easy to comprehend. Unfortunately, the situation is perhaps a little worse than you think. In this section we will discuss
some SQL statements that are correct, but which act just a little funny.
When does one timestamp not equal another with the same value? The answer is, when one
value uses a 24 hour notation to represent midnight and the other does not. To illustrate, the
following two timestamp values represent the same point in time, but not according to DB2:
WITH temp1 (c1,t1,t2) AS (VALUES
(A
,TIMESTAMP(1996-05-01-24.00.00.000000)
,TIMESTAMP(1996-05-02-00.00.00.000000) ))
SELECT c1
FROM
temp1
WHERE t1 = t2;
ANSWER
=========
<no rows>
ANSWER
======
C1
-A
One might have to use the 24-hour notation, if one needs to record (in DB2) external actions
that happen just before midnight - with the correct date value. To illustrate, imagine that we
have the following table, which records supermarket sales:
CREATE TABLE supermarket_sales
(sales_ts
TIMESTAMP
NOT NULL
,sales_val DECIMAL(8,2)
NOT NULL
,PRIMARY KEY(sales_ts));
Quirks in SQL
357
Graeme Birchall
Now, if we want to select all of the rows that are for a given day, we can write this:
SELECT
FROM
WHERE
ORDER BY
*
supermarket_sales
DATE(sales_ts) = 2003-08-01
sales_ts;
*
supermarket_sales
sales_ts BETWEEN 2003-08-01-00.00.00
AND 2003-08-01-24.00.00
ORDER BY sales_ts;
How many rows to are returned by a query when no rows match the provided predicates? The
answer is that sometimes you get none, and sometimes you get one:
SELECT
FROM
WHERE
creator
sysibm.systables
creator = ZZZ;
ANSWER
========
<no row>
MAX(creator)
sysibm.systables
creator = ZZZ;
ANSWER
======
<null>
MAX(creator)
sysibm.systables
creator = ZZZ
MAX(creator) IS NOT NULL;
ANSWER
========
<no row>
MAX(creator)
sysibm.systables
creator
= ZZZ
MAX(creator) = ZZZ;
ANSWER
========
<no row>
MAX(creator)
sysibm.systables
creator = ZZZ
creator;
ANSWER
========
<no row>
creator
sysibm.systables
creator = ZZZ
creator;
ANSWER
========
<no row>
COUNT(*)
sysibm.systables
creator = ZZZ
creator;
358
ANSWER
========
<no row>
SELECT
FROM
WHERE
COUNT(*)
sysibm.systables
creator = ZZZ;
ANSWER
======
0
When there is no column function (e.g. MAX, COUNT) in the SELECT then, if there are
no matching rows, no row is returned.
If there is a column function in the SELECT, but nothing else, then the query will always
return a row - with zero if the function is a COUNT, and null if it is something else.
If there is a column function in the SELECT, and also a HAVING phrase in the query, a
row will only be returned if the HAVING predicate is true.
If there is a column function in the SELECT, and also a GROUP BY phrase in the query,
a row will only be returned if there was one that matched.
Imagine that one wants to retrieve a list of names from the STAFF table, but when no names
match, one wants to get a row/column with the phrase "NO NAMES", rather than zero rows.
The next query does this by first generating a "not found" row using the SYSDUMMY1 table,
and then left-outer-joining to the set of matching rows in the STAFF table. The COALESCE
function will return the STAFF data, if there is any, else the not-found data:
SELECT
COALESCE(name,noname) AS nme
,COALESCE(salary,nosal) AS sal
FROM
(SELECT
NO NAME AS noname
,0
AS nosal
FROM
sysibm.sysdummy1
)AS nnn
LEFT OUTER JOIN
(SELECT
*
FROM
staff
WHERE
id < 5
)AS xxx
ON
1 = 1
ORDER BY name;
ANSWER
============
NME
SAL
------- ---NO NAME 0.00
ANSWER
============
NME
SAL
------- ---NO NAME 0.00
Imagine that you have some character value that you convert to a DB2 date. The correct way
to do it is given below:
Quirks in SQL
359
Graeme Birchall
SELECT
FROM
DATE(2001-09-22)
sysibm.sysdummy1;
ANSWER
==========
2001-09-22
DATE(2001-09-22)
sysibm.sysdummy1;
ANSWER
==========
0006-05-24
The following query was written with intentions of getting a single random row out of the
matching set in the STAFF table. Unfortunately, it returned two rows:
SELECT
id
,name
FROM
staff
WHERE
id <= 100
AND
id
= (INT(RAND()* 10) * 10) + 10
ORDER BY id;
ANSWER
===========
ID NAME
-- -------30 Marenghi
60 Quigley
ANSWER
====================
ID NAME
RAN EQL
--- -------- --- --10 Sanders
10 Y
20 Pernal
30
30 Marenghi 70
40 OBrien
10
50 Hanes
30
60 Quigley
40
70 Rothman
30
80 James
100
90 Koonitz
40
100 Plotz
100 Y
360
There are several ways to always get exactly "n" random rows from a set of matching rows.
In the following example, three rows are required:
WITH
staff_numbered AS
(SELECT s.*
,ROW_NUMBER() OVER() AS row#
FROM
staff s
WHERE
id <= 100
),
count_rows AS
(SELECT MAX(row#) AS #rows
FROM
staff_numbered
),
random_values (RAN#) AS
(VALUES (RAND())
,(RAND())
,(RAND())
),
rows_t0_get AS
(SELECT INT(ran# * #rows) + 1 AS get_row
FROM
random_values
,count_rows
)
SELECT
id
,name
FROM
staff_numbered
,rows_t0_get
WHERE
row# = get_row
ORDER BY id;
ANSWER
===========
ID NAME
--- ------10 Sanders
20 Pernal
90 Koonitz
First, the matching rows in the STAFF table are assigned a row number.
Fourth, the three random values are joined to the row-count value, resulting in three new
row-number values (of type integer) within the correct range.
Finally, the three row-number values are joined to the original temporary table.
If more than a small number of random rows are required, the random values cannot be
defined using the VALUES phrase. Some recursive code can do the job.
In the extremely unlikely event that the RAND function returns the value "one", no row
will match. CASE logic can be used to address this issue.
Ignoring the problem just mentioned, the above query will always return three rows, but
the rows may not be different rows. Depending on what the three RAND calls generate,
the query may even return just one row - repeated three times.
In contrast to the above query, the following will always return three different random rows:
Quirks in SQL
361
Graeme Birchall
SELECT
id
,name
FROM
(SELECT s.*
,ROW_NUMBER() OVER(ORDER BY RAND()) AS r
FROM
staff s
WHERE id <= 100
)AS xxx
WHERE
r <= 3
ORDER BY id;
ANSWER
===========
ID NAME
-- -------10 Sanders
40 OBrien
60 Quigley
The lesson to be learnt here is that one must consider exactly how random one wants to be
when one goes searching for a set of random rows:
Does one want the number of rows returned to be also somewhat random?
Does one want exactly "n" rows, but it is OK to get the same row twice?
Does one want exactly "n" distinct (i.e. different) random rows?
Date/Time Manipulation
I once had a table that contained two fields - the timestamp when an event began, and the
elapsed time of the event. To get the end-time of the event, I added the elapsed time to the
begin-timestamp - as in the following SQL:
WITH temp1 (bgn_tstamp, elp_sec) AS
(VALUES (TIMESTAMP(2001-01-15-01.02.03.000000), 1.234)
,(TIMESTAMP(2001-01-15-01.02.03.123456), 1.234)
)
SELECT
bgn_tstamp
,elp_sec
,bgn_tstamp + elp_sec SECONDS AS end_tstamp
FROM
temp1;
ANSWER
======
BGN_TSTAMP
-------------------------2001-01-15-01.02.03.000000
2001-01-15-01.02.03.123456
ELP_SEC
------1.234
1.234
END_TSTAMP
-------------------------2001-01-15-01.02.04.000000
2001-01-15-01.02.04.123456
362
ANSWER
======
BGN_TSTAMP
-------------------------2001-01-15-01.02.03.000000
2001-01-15-01.02.03.123456
ELP_SEC
------1.234
1.234
END_TSTAMP
-------------------------2001-01-15-01.02.04.234000
2001-01-15-01.02.04.357456
When one has a fractional date/time value (e.g. 5.1 days, 4.2 hours, or 3.1 seconds) that is for
a period of fixed length that one wants to use in a date/time calculation, then one has to convert the value into some whole number of a more precise time period. Thus 5.1 days times
82,800 will give one the equivalent number of seconds and 6.2 seconds times 1E6 (i.e. one
million) will give one the equivalent number of microseconds.
Use of LIKE on VARCHAR
Sometimes one value can be EQUAL to another, but is not LIKE the same. To illustrate, the
following SQL refers to two fields of interest, one CHAR, and the other VARCHAR. Observe below that both rows in these two fields are seemingly equal:
WITH temp1 (c0,c1,v1) AS (VALUES
(A,CHAR( ,1),VARCHAR( ,1)),
(B,CHAR( ,1),VARCHAR( ,1)))
SELECT c0
FROM
temp1
WHERE c1 = v1
AND c1 LIKE ;
ANSWER
======
C0
-A
B
ANSWER
======
C0
-A
Quirks in SQL
363
Graeme Birchall
the LIKE check does not pad VARCHAR fields with blanks. So the LIKE test in the second
SQL statement only matched on one row.
The RTRIM function can be used to remove all trailing blanks and so get around this problem:
WITH temp1 (c0,c1,v1) AS (VALUES
(A,CHAR( ,1),VARCHAR( ,1)),
(B,CHAR( ,1),VARCHAR( ,1)))
SELECT c0
FROM
temp1
WHERE c1 = v1
AND RTRIM(v1) LIKE ;
ANSWER
======
C0
-A
B
One often wants to compare what happened in part of one year against the same period in
another year. For example, one might compare January sales over a decade period. This may
be a perfectly valid thing to do when comparing whole months, but it rarely makes sense
when comparing weeks or individual days.
The problem with comparing weeks from one year to the next is that the same week (as defined by DB2) rarely encompasses the same set of days. The following query illustrates this
point by showing the set of days that make up week 33 over a ten-year period. Observe that
some years have almost no overlap with the next:
WITH temp1 (yymmdd) AS
(VALUES DATE(2000-01-01)
UNION ALL
SELECT yymmdd + 1 DAY
FROM
temp1
WHERE yymmdd < 2010-12-31
)
SELECT
yy
AS year
,CHAR(MIN(yymmdd),ISO) AS min_dt
,CHAR(MAX(yymmdd),ISO) AS max_dt
FROM
(SELECT yymmdd
,YEAR(yymmdd) yy
,WEEK(yymmdd) wk
FROM
temp1
WHERE WEEK(yymmdd) = 33
)AS xxx
GROUP BY yy
,wk;
ANSWER
==========================
YEAR MIN_DT
MAX_DT
---- ---------- ---------2000 2000-08-06 2000-08-12
2001 2001-08-12 2001-08-18
2002 2002-08-11 2002-08-17
2003 2003-08-10 2003-08-16
2004 2004-08-08 2004-08-14
2005 2005-08-07 2005-08-13
2006 2006-08-13 2006-08-19
2007 2007-08-12 2007-08-18
2008 2008-08-10 2008-08-16
2009 2009-08-09 2009-08-15
2010 2010-08-08 2010-08-14
When converting from one numeric type to another where there is a loss of precision, DB2
always truncates not rounds. For this reason, the S1 result below is not equal to the S2 result:
SELECT
FROM
SUM(INTEGER(salary)) AS s1
,INTEGER(SUM(salary)) AS s2
staff;
ANSWER
=============
S1
S2
------ -----583633 583647
364
SELECT
FROM
SUM(INTEGER(ROUND(salary,-1))) AS s1
,INTEGER(SUM(salary)) AS s2
staff;
ANSWER
=============
S1
S2
------ -----583640 583647
The case WHEN checks are processed in the order that they are found. The first one that
matches is the one used. To illustrate, the following statement will always return the value
FEM in the SXX field:
SELECT
lastname
,sex
,CASE
WHEN sex >= F THEN FEM
WHEN sex >= M THEN MAL
END AS sxx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
=================
LASTNAME
SX SXX
---------- -- --JEFFERSON M FEM
JOHNSON
F FEM
JONES
M FEM
lastname
,sex
,CASE
WHEN sex >= M THEN MAL
WHEN sex >= F THEN FEM
END AS sxx
FROM
employee
WHERE
lastname LIKE J%
ORDER BY 1;
ANSWER
=================
LASTNAME
SX SXX
---------- -- --JEFFERSON M MAL
JOHNSON
F FEM
JONES
M MAL
AVG(salary) / AVG(comm) AS a1
,AVG(salary / comm)
AS a2
staff;
ANSWER >>>
A1
-32
A2
----61.98
DB2 has a bind option (called DATETIME) that specifies the default output format of datetime data. This bind option has no impact on the sequence with which date-time data is presented. It simply defines the output template used. To illustrate, the plan that was used to run
the following SQL defaults to the USA date-time-format bind option. Observe that the month
is the first field printed, but the rows are sequenced by year:
Quirks in SQL
365
Graeme Birchall
SELECT
FROM
WHERE
ORDER BY
hiredate
employee
hiredate < 1960-01-01
1;
ANSWER
==========
1947-05-05
1949-08-17
1958-05-16
CHAR(hiredate,USA)
employee
hiredate < 1960-01-01
1;
ANSWER
==========
05/05/1947
05/16/1958
08/17/1949
The following pseudo-code will fetch all of the rows in the STAFF table (which has IDs
ranging from 10 to 350) and, then while still fetching, insert new rows into the same STAFF
table that are the same as those already there, but with IDs that are 500 larger.
EXEC-SQL
DECLARE fred CURSOR FOR
SELECT
*
FROM
staff
WHERE
id < 1000
ORDER BY id;
END-EXEC;
EXEC-SQL
OPEN fred
END-EXEC;
DO UNTIL SQLCODE = 100;
EXEC-SQL
FETCH fred
INTO :HOST-VARS
END-EXEC;
IF SQLCODE <> 100 THEN DO;
SET HOST-VAR.ID = HOST-VAR.ID + 500;
EXEC-SQL
INSERT INTO staff VALUES (:HOST-VARS)
END-EXEC;
END-DO;
END-DO;
EXEC-SQL
CLOSE fred
END-EXEC;
366
Be aware that DB2, unlike some other database products, does NOT (always) retrieve all of
the matching rows at OPEN CURSOR time. Furthermore, understand that this is a good thing
for it means that DB2 (usually) does not process any row that you do not need.
DB2 is very good at always returning the same answer, regardless of the access path used. It
is equally good at giving consistent results when the same logical statement is written in a
different manner (e.g. A=B vs. B=A). What it has never done consistently (and never will) is
guarantee that concurrent read and write statements (being run by the same user) will always
give the same results.
Floating Point Numbers
DECIMAL1
-------------------1.
12.
123.
1234.
12345.
123456.
1234567.
12345678.
123456789.
1234567890.
12345678900.
123456789000.
1234567890000.
12345678900000.
123456789000000.
1234567890000000.
12345678900000000.
123456789000000000.
1234567890000000000.
BIGINT1
------------------1
12
123
1234
12345
123456
1234567
12345678
123456788
1234567889
12345678899
123456788999
1234567889999
12345678899999
123456788999999
1234567889999999
12345678899999998
123456788999999984
1234567889999999744
Quirks in SQL
367
Graeme Birchall
Figure 961, Two numbers that look equal, but arent equal
We can use the HEX function to show that, internally, the two numbers being compared
above are not equal:
WITH temp (f1,f2) AS
(VALUES (FLOAT(1.23456789E1 * 10 * 10 * 10 * 10 * 10 * 10 * 10)
,FLOAT(1.23456789E8)))
SELECT HEX(f1) AS hex_f1
,HEX(f2) AS hex_f2
FROM
temp
ANSWER
WHERE f1 <> f2;
=================================
HEX_F1
HEX_F2
---------------- ---------------FFFFFF53346F9D41 00000054346F9D41
Figure 962, Two numbers that look equal, but arent equal, shown in HEX
Now we can explain what is going on in the recursive code shown at the start of this section.
The same value is be displayed using three different methods:
The floating-point representation (on the left) is really a decimal approximation (done
using rounding) of the underlying binary value.
When the floating-point data was converted to decimal (in the middle), it was rounded
using the same method that is used when it is displayed directly.
When the floating-point data was converted to bigint (on the right), no rounding was
done because both formats hold binary values.
In any computer-based number system, when you do division, you can get imprecise results
due to rounding. For example, when you divide 1 by 3 you get "one third", which can not be
stored accurately in either a decimal or a binary number system. Because they store numbers
internally differently, dividing the same number in floating-point vs. decimal can result in
different results. Here is an example:
WITH
temp1 (dec1, dbl1) AS
(VALUES (DECIMAL(1),DOUBLE(1)))
,temp2 (dec1, dec2, dbl1, dbl2) AS
(SELECT dec1
,dec1 / 3 AS dec2
,dbl1
,dbl1 / 3 AS dbl2
FROM
temp1)
SELECT *
FROM
temp2
WHERE dbl2 <> dec2;
368
D1
--------------------1.2345678900
12.3456789000
123.4567890000
1234.5678900000
12345.6789000000
123456.7890000000
1234567.8900000000
12345678.9000000000
123456789.0000000000
1234567890.0000000000
COMPARE
------SAME
SAME
DIFF
DIFF
DIFF
DIFF
SAME
DIFF
DIFF
DIFF
ANSWER
=======================================
F1
HEX_F1
---------------------- ---------------+1.00000000000000E-001 9A9999999999B93F
Quirks in SQL
369
Graeme Birchall
SELECT db2/v5
AS answer
FROM
damn_lawyers;
ANSWER
-----0
617
370
ANSWER
-----0
617
Appendix
DB2 Sample Tables
Class Schedule
CREATE TABLE CL_SCHED
(CLASS_CODE
CHARACTER
,DAY
SMALLINT
,STARTING
TIME
,ENDING
TIME);
(00007)
(00003)
(00029)
(00006)
(00003)
(00016)
NOT NULL
NOT NULL
NOT NULL
DEPTNAME
----------------------------SPIFFY COMPUTER SERVICE DIV.
PLANNING
INFORMATION CENTER
DEVELOPMENT CENTER
MANUFACTURING SYSTEMS
ADMINISTRATION SYSTEMS
SUPPORT SERVICES
OPERATIONS
SOFTWARE SUPPORT
MGRNO
-----000010
000020
000030
000060
000070
000050
000090
000100
ADMRDEPT
-------A00
A00
A00
A00
D01
D01
A00
E01
E01
LOCATION
----------------
(00006)
(00012)
(00001)
(00015)
(00003)
(00004)
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
(00008)
NOT NULL
(00001)
(09,02)
(09,02)
(09,02)
Appendix
371
Graeme Birchall
EMPNO
-----000010
000020
000030
000050
000060
000070
000090
000100
000110
000120
000130
000140
000150
000160
000170
000180
000190
000200
000210
000220
000230
000240
000250
000260
000270
000280
000290
000300
000310
000320
000330
000340
FIRSTNME
--------CHRISTINE
MICHAEL
SALLY
JOHN
IRVING
EVA
EILEEN
THEODORE
VINCENZO
SEAN
DOLORES
HEATHER
BRUCE
ELIZABETH
MASATOSHI
MARILYN
JAMES
DAVID
WILLIAM
JENNIFER
JAMES
SALVATORE
DANIEL
SYBIL
MARIA
ETHEL
JOHN
PHILIP
MAUDE
RAMLAL
WING
JASON
M
I
L
A
B
F
D
W
Q
G
M
A
R
J
S
H
T
K
J
M
S
P
L
R
R
X
F
V
R
LASTNAME
--------HAAS
THOMPSON
KWAN
GEYER
STERN
PULASKI
HENDERSON
SPENSER
LUCCHESSI
OCONNELL
QUINTANA
NICHOLLS
ADAMSON
PIANKA
YOSHIMURA
SCOUTTEN
WALKER
BROWN
JONES
LUTZ
JEFFERSON
MARINO
SMITH
JOHNSON
PEREZ
SCHNEIDER
PARKER
SMITH
SETRIGHT
MEHTA
LEE
GOUNOT
WKD
--A00
B01
C01
E01
D11
D21
E11
E21
A00
A00
C01
C01
D11
D11
D11
D11
D11
D11
D11
D11
D21
D21
D21
D21
D21
E11
E11
E11
E11
E21
E21
E21
PH#
---3978
3476
4738
6789
6423
7831
5498
0972
3490
2167
4578
1793
4510
3782
2890
1682
2986
4501
0942
0672
2094
3780
0961
8953
9001
8997
4502
2095
3332
9990
2103
5698
HIREDATE
---------1965-01-01
1973-10-10
1975-04-05
1949-08-17
1973-09-14
1980-09-30
1970-08-15
1980-06-19
1958-05-16
1963-12-05
1971-07-28
1976-12-15
1972-02-12
1977-10-11
1978-09-15
1973-07-07
1974-07-26
1966-03-03
1979-04-11
1968-08-29
1966-11-21
1979-12-05
1969-10-30
1975-09-11
1980-09-30
1967-03-24
1980-05-30
1972-06-19
1964-09-12
1965-07-07
1976-02-23
1947-05-05
JOB
-------PRES
MANAGER
MANAGER
MANAGER
MANAGER
MANAGER
MANAGER
MANAGER
SALESREP
CLERK
ANALYST
ANALYST
DESIGNER
DESIGNER
DESIGNER
DESIGNER
DESIGNER
DESIGNER
DESIGNER
DESIGNER
CLERK
CLERK
CLERK
CLERK
CLERK
OPERATOR
OPERATOR
OPERATOR
OPERATOR
FIELDREP
FIELDREP
FIELDREP
ED
-18
18
20
16
16
16
16
14
19
14
16
18
16
17
16
17
16
16
17
18
14
17
15
16
15
17
12
14
12
16
14
16
S
F
M
F
M
M
F
F
M
M
M
F
F
M
F
M
F
M
M
M
F
M
M
M
F
F
F
M
M
F
M
M
M
BIRTHDTE
-------19330824
19480202
19410511
19250915
19450707
19530526
19410515
19561218
19291105
19421018
19250915
19460119
19470517
19550412
19510105
19490221
19520625
19410529
19530223
19480319
19350530
19540331
19391112
19361005
19530526
19360328
19460709
19361027
19310421
19320811
19410718
19260517
SALRY
----52750
41250
38250
40175
32250
36170
29750
26150
46500
29250
23800
28420
25280
22250
24680
21340
20450
27740
18270
29840
22180
28760
19180
17250
27380
26250
15340
17750
15900
19950
25370
23840
BNS
--1K
800
800
800
500
700
600
500
900
600
500
600
500
400
500
500
400
600
400
600
400
600
400
300
500
500
300
400
300
400
500
500
COMM
---4220
3300
3060
3214
2580
2893
2380
2092
3720
2340
1904
2274
2022
1780
1974
1707
1636
2217
1462
2387
1774
2301
1534
1380
2190
2100
1227
1420
1272
1596
2030
1907
(00006)
(00006)
NOT NULL
NOT NULL
NOT NULL
(05,02)
PROJNO
-----AD3100
MA2100
MA2110
PL2100
PL2100
IF1000
IF2000
OP1000
OP2010
AD3110
OP1010
OP2010
MA2100
IF1000
IF1000
ACTNO
----10
10
10
30
30
10
10
10
10
10
10
10
20
90
100
EMPTIME
------0.50
0.50
1.00
1.00
1.00
0.50
0.50
0.25
0.75
1.00
1.00
1.00
1.00
1.00
0.50
EMSTDATE
---------1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-06-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-10-01
EMENDATE
---------1982-07-01
1982-11-01
1983-02-01
1982-09-15
1982-09-15
1983-01-01
1983-01-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1982-03-01
1982-10-01
1983-01-01
372
EMPNO
-----000140
000140
000140
000140
000140
000150
000150
000160
000170
000170
000170
000180
000190
000190
000200
000200
000210
000210
000220
000230
000230
000230
000230
000230
000240
000240
000250
000250
000250
000250
000250
000250
000250
000250
000250
000250
000260
000260
000260
000260
000260
000260
000260
000270
000270
000270
000270
000270
000270
000270
000280
000290
000300
000310
000320
000320
000330
000330
000340
000340
PROJNO
-----IF1000
IF2000
IF2000
IF2000
IF2000
MA2112
MA2112
MA2113
MA2112
MA2112
MA2113
MA2113
MA2112
MA2112
MA2111
MA2111
MA2113
MA2113
MA2111
AD3111
AD3111
AD3111
AD3111
AD3111
AD3111
AD3111
AD3112
AD3112
AD3112
AD3112
AD3112
AD3112
AD3112
AD3112
AD3112
AD3112
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
AD3113
OP1010
OP1010
OP1010
OP1010
OP2011
OP2011
OP2012
OP2012
OP2013
OP2013
ACTNO
----90
100
100
110
110
60
180
60
60
70
80
70
70
80
50
60
80
180
40
60
60
70
80
180
70
80
60
60
60
60
70
70
70
80
80
180
70
70
80
80
180
180
180
60
60
60
70
70
80
80
130
130
130
130
140
150
140
160
140
170
EMPTIME
------0.50
0.50
1.00
0.50
0.50
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.50
0.50
1.00
0.50
1.00
0.50
0.50
1.00
1.00
1.00
0.50
0.50
1.00
1.00
0.25
0.50
1.00
0.25
0.50
0.50
0.50
1.00
0.50
1.00
0.50
0.50
1.00
0.25
0.50
1.00
0.75
1.00
0.50
1.00
1.00
1.00
1.00
1.00
0.75
0.25
0.25
0.75
0.50
0.50
EMSTDATE
---------1982-10-01
1982-03-01
1982-01-01
1982-03-01
1982-10-01
1982-01-01
1982-07-15
1982-07-15
1982-01-01
1982-06-01
1982-01-01
1982-04-01
1982-02-01
1982-10-01
1982-01-01
1982-06-15
1982-10-01
1982-10-01
1982-01-01
1982-03-15
1982-01-01
1982-03-15
1982-04-15
1982-10-15
1982-02-15
1982-09-15
1982-02-01
1982-12-01
1982-01-01
1983-01-01
1982-08-15
1982-02-01
1982-03-15
1982-08-15
1982-10-15
1982-08-15
1982-06-15
1982-07-01
1982-03-01
1982-01-01
1982-03-01
1982-06-01
1982-04-15
1982-09-01
1982-03-01
1982-04-01
1982-09-01
1982-10-15
1982-03-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
1982-01-01
EMENDATE
---------1983-01-01
1982-07-01
1982-03-01
1982-07-01
1983-01-01
1982-07-15
1983-02-01
1983-02-01
1983-06-01
1983-02-01
1983-02-01
1982-06-15
1982-10-01
1983-10-01
1982-06-15
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1982-04-15
1982-03-15
1982-10-15
1982-10-15
1983-01-01
1982-09-15
1983-01-01
1982-03-15
1983-01-01
1982-02-01
1983-02-01
1982-10-15
1982-03-15
1982-08-15
1982-10-15
1982-12-01
1983-01-01
1982-07-01
1983-02-01
1982-04-15
1982-03-01
1982-04-15
1982-07-01
1982-06-01
1982-10-15
1982-04-01
1982-09-01
1982-10-15
1983-02-01
1982-04-01
1982-03-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
1983-02-01
Appendix
373
Graeme Birchall
Employee Photo
CREATE TABLE EMP_PHOTO
(EMPNO
CHARACTER (00006)
,PHOTO_FORMAT
VARCHAR
(00010)
,PICTURE
BLOB
(0100)K
,PRIMARY KEY(EMPNO,PHOTO_FORMAT));
NOT NULL
NOT NULL
PHOTO_FORMAT
-----------bitmap
gif
xwd
bitmap
gif
xwd
bitmap
gif
xwd
bitmap
gif
xwd
PICTURE
------------<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
NOT NULL
NOT NULL
RESUME_FORMAT
------------ascii
script
ascii
script
ascii
script
ascii
script
RESUME
------------<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
<<NOT SHOWN>>
(00008)
(00064)
(03000));
374
Organization
CREATE TABLE ORG
(DEPTNUMB
SMALLINT
,DEPTNAME
VARCHAR
,MANAGER
SMALLINT
,DIVISION
VARCHAR
,LOCATION
VARCHAR
,PRIMARY KEY(DEPTNUMB));
NOT NULL
(00014)
(00010)
(00013)
DEPTNAME
-------------Head Office
New England
Mid Atlantic
South Atlantic
Great Lakes
Plains
Pacific
Mountain
MANAGER
------160
50
10
30
100
140
270
290
DIVISION
---------Corporate
Eastern
Eastern
Eastern
Midwest
Midwest
Western
Western
LOCATION
------------New York
Boston
Washington
Atlanta
Chicago
Dallas
San Francisco
Denver
(00006)
(00024)
(00003)
(00006)
(05,02)
NOT
NOT
NOT
NOT
NULL
NULL
NULL
NULL
(00006)
PROJNAME
---------------------ADMIN SERVICES
GENERAL ADMIN SYSTEMS
PAYROLL PROGRAMMING
PERSONNEL PROGRAMMING
ACCOUNT PROGRAMMING
QUERY SERVICES
USER EDUCATION
WELD LINE AUTOMATION
W L PROGRAMMING
W L PROGRAM DESIGN
W L ROBOT DESIGN
W L PROD CONT PROGS
OPERATION SUPPORT
OPERATION
GEN SYSTEMS SERVICES
SYSTEMS SUPPORT
SCP SYSTEMS SUPPORT
APPLICATIONS SUPPORT
DB/DC SUPPORT
WELD LINE PLANNING
DP#
--D01
D21
D21
D21
D21
C01
C01
D01
D11
D11
D11
D11
E01
E11
E01
E21
E21
E21
E21
B01
Appendix
375
Graeme Birchall
Sales
CREATE TABLE SALES
(SALES_DATE
,SALES_PERSON
,REGION
,SALES
DATE
VARCHAR
VARCHAR
INTEGER);
(00015)
(00015)
SALES_PERSON
--------------GOUNOT
LEE
LEE
LEE
LUCCHESSI
GOUNOT
GOUNOT
GOUNOT
LEE
LEE
LEE
LEE
LUCCHESSI
LUCCHESSI
GOUNOT
GOUNOT
GOUNOT
LEE
LEE
LEE
LEE
LUCCHESSI
LUCCHESSI
LUCCHESSI
GOUNOT
GOUNOT
LEE
LEE
LEE
LEE
LUCCHESSI
GOUNOT
GOUNOT
GOUNOT
GOUNOT
LEE
LEE
LEE
LEE
LUCCHESSI
LUCCHESSI
REGION
SALES
--------------- ----Quebec
Manitoba
Ontario-South
Quebec
Ontario-South
Manitoba
Ontario-South
Quebec
Manitoba
Ontario-North
Ontario-South
Quebec
Ontario-South
Quebec
Manitoba
Ontario-South
Quebec
Manitoba
Ontario-North
Ontario-South
Quebec
Manitoba
Ontario-South
Quebec
Ontario-South
Quebec
Manitoba
Ontario-North
Ontario-South
Quebec
Manitoba
Manitoba
Ontario-North
Ontario-South
Quebec
Manitoba
Ontario-North
Ontario-South
Quebec
Manitoba
Ontario-South
1
2
3
1
1
7
3
1
5
2
2
3
3
1
1
2
18
4
3
7
7
1
1
2
2
1
3
3
14
7
1
7
1
3
3
9
8
8
1
3
SMALLINT
VARCHAR
SMALLINT
CHARACTER
SMALLINT
DECIMAL
DECIMAL
NOT NULL
(00009)
(00005)
(07,02)
(07,02)
376
ID
-----10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
210
220
230
240
250
260
270
280
290
300
310
320
330
340
350
NAME
--------Sanders
Pernal
Marenghi
OBrien
Hanes
Quigley
Rothman
James
Koonitz
Plotz
Ngan
Naughton
Yamaguchi
Fraye
Williams
Molinare
Kermisch
Abrahams
Sneider
Scoutten
Lu
Smith
Lundquist
Daniels
Wheeler
Jones
Lea
Wilson
Quill
Davis
Graham
Gonzales
Burke
Edwards
Gafney
DEPT
-----20
20
38
38
15
38
15
20
42
42
15
38
42
51
51
10
15
38
20
42
10
51
51
10
51
10
66
66
84
84
66
66
66
84
84
JOB
----Mgr
Sales
Mgr
Sales
Mgr
Sales
Sales
Clerk
Sales
Mgr
Clerk
Clerk
Clerk
Mgr
Sales
Mgr
Clerk
Clerk
Clerk
Clerk
Mgr
Sales
Clerk
Mgr
Clerk
Mgr
Mgr
Sales
Mgr
Sales
Sales
Sales
Clerk
Sales
Clerk
YEARS
-----7
8
5
6
10
7
6
7
5
6
6
6
7
4
3
8
10
7
3
5
6
12
9
9
10
5
13
4
1
7
5
SALARY
--------18357.50
18171.25
17506.75
18006.00
20659.80
16808.30
16502.83
13504.60
18001.75
18352.80
12508.20
12954.75
10505.90
21150.00
19456.50
22959.20
12258.50
12009.75
14252.75
11508.60
20010.00
17654.50
13369.80
19260.25
14460.00
21234.00
18555.50
18674.50
19818.00
15454.50
21000.00
16858.20
10988.00
17844.00
13030.50
COMM
--------612.45
846.55
650.25
1152.00
128.20
1386.70
206.60
180.00
75.60
637.65
110.10
236.50
126.50
84.20
992.80
189.65
513.30
811.50
806.10
200.30
844.00
55.50
1285.00
188.00
Not all of the above tables come with primary keys defined, so it may be worth your while to
run the following:
ALTER
ALTER
ALTER
ALTER
TABLE
TABLE
TABLE
TABLE
department
employee
project
staff
ADD
ADD
ADD
ADD
PRIMARY
PRIMARY
PRIMARY
PRIMARY
KEY
KEY
KEY
KEY
(deptno);
(empno);
(projno);
(id);
Appendix
377
Graeme Birchall
378
Book Binding
Below is a quick-and-dirty technique for making a book out of this book. The object of the
exercise is to have a manual that will last a long time, and that will also lie flat when opened
up. All suggested actions are done at your own risk.
Tools Required
BINDER CLIPS, (1" size), to hold the pages together while gluing. To bind larger books,
or to do multiple books in one go, use two or more cheap screw clamps.
CARDBOARD: Two pieces of thick card, to also help hold things together while gluing.
Consumables
Ignoring the capital costs mentioned above, the cost of making a bound book should work out
to about $4.00 per item, almost all of which is spent on the paper and toner. To bind an already printed copy should cost less than fifty cents.
GLUE, to bind the book. Cheap rubber cement will do the job The glue must come with
an applicator brush in the bottle. Sears hardware stores sell a more potent flavor called
Duro Contact Cement that is quite a bit better. This is toxic stuff, so be careful.
CLOTH TAPE, (2" wide) to bind the spine. Pearl tape, available from Pearl stores, is
fine. Wider tape will be required if you are not printing double-sided.
TIME: With practice, this process takes less than five minutes work per book.
Instructions
Print the book - double-sided if you can. If you want, print the first and last pages on card
stock to make suitable protective covers.
Jog the pages, so that they are all lined up along the inside spine. Make sure that every
page is perfectly aligned, otherwise some pages wont bind. Put a piece of thick cardboard on either side of the set of pages to be bound. These will hold the pages tight during the gluing process.
Book Binding
379
Graeme Birchall
Place binder clips on the top and bottom edges of the book (near the spine), to hold everything in place while you glue. One can also put a couple on the outside edge to stop the
pages from splaying out in the next step. If the pages tend to spread out in the middle of
the spine, put one in the centre of the spine, then work around it when gluing. Make sure
there are no gaps between leafs, where the glue might soak in.
Place the book spine upwards. The objective here is to have a flat surface to apply the
glue on. Lean the book against something if it does not stand up freely.
Put on gobs of glue. Let it soak into the paper for a bit, then put on some more.
Let the glue dry for at least half an hour. A couple of hours should be plenty.
Remove the binder clips that are holding the book together. Be careful because the glue
does not have much structural strength.
Separate the cardboard that was put on either side of the book pages. To do this, carefully
open the cardboard pages up (as if reading their inside covers), then run the knife down
the glue between each board and the rest of the book.
Lay the book flat with the front side facing up. Be careful here because the rubber cement
is not very strong.
Cut the tape to a length that is a little longer that the height of the book.
Put the tape on the book, lining it up so that about one quarter of an inch (of the tape
width) is on the front side of the book. Press the tape down firmly (on the front side only)
so that it is properly attached to the cover. Make sure that a little bit of tape sticks out of
both the bottom and top ends of the spine.
Turn the book over (gently) and, from the rear side, wrap the cloth tape around the spine
of the book. Pull the tape around so that it puts the spine under compression.
Trim excess tape at either end of the spine using a knife or pair of scissors.
Let the book dry for a day. Then do the old "hold by a single leaf" test. Pick any page,
and gently pull the page up into the air. The book should follow without separating from
the page.
More Information
The binding technique that I have described above is fast and easy, but rather crude. It would
not be suitable if one was printing books for sale. There are, however, other binding methods
that take a little more skill and better gear that can be used to make "store-quality" books. A
good reference on the general subject of home publishing is Book-on-Demand Publishing
(ISBN 1-881676-02-1) by Rupert Evans. The publisher is BlackLightning Publications Inc.
They are on the web (see: www.flashweb.com).
380
Index
A
ABS function, 113
ACOS function, 114
ADD function. See PLUS function
AGGREGATION function
BETWEEN, 108
Definition, 104
ORDER BY, 106
PARTITION, 111
RANGE, 110
ROWS, 107
Alias, 19
ALL, sub-query, 231, 241
AND vs. OR, precedence rules, 35
ANY, sub-query, 230, 239
Arithmetic, precedence rules, 35
AS statement
Correlation name, 28
Renaming fields, 29
ASCII function, 114
ASIN function, 114
ATAN function, 114
ATOMIC, BEGIN statement, 63
AVG
Definition, 280
Full-select clause, 282
Compound SQL
DECLARE variables, 64
Definition, 63
FOR statement, 65
IF statement, 66
LEAVE statement, 67
Scalar function, 180
SIGNAL statement, 67
Table function, 183
WHILE statement, 67
CONCAT function, 119, 160
Constraint, 75, 76
Convergent hierarchy, 298
Convert
Definition, 236
NOT EXISTS, 238
B
Balanced hierarchy, 299
BEGIN ATOMIC statement, 63
BERNOULI. See TABLESAMPLE option
BETWEEN
C
Cartesian Product, 218
CASE expression
CASE usage, 42
Definition, 36
CORRELATION function, 83
Correlation name, 28
COS function, 120
COT function, 120
COUNT DISTINCT function
Definition, 83
Null values, 94
COUNT function
Definition, 83
No rows, 84, 205, 358
Null values, 83
COUNT_BIG function, 84
COVARIANCE function, 84
Create Table
Constraint, 75, 76
Dimensions, 257
Example, 18
Identity Column, 260, 263
Indexes, 256
Materialized query table, 249
Referential Integrity, 75, 76
Staging tables, 257
CUBE, 199
Index
381
Graeme Birchall
D
Data in view definition, 18
Data types, 20, 24
DATE
Arithmetic, 21
AVG calculation, 82
Duration, 22
Function, 121
Labeled duration, 21
Manipulation, 21, 359, 362
Output order, 365
DAY function, 121
DAYNAME function, 122
DAYOFWEEK function, 122
DAYOFYEAR function, 123
DAYS function, 123
DBPARTITIONNUM function, 124
DECIMAL
Date/Time duration, 22
Labeled duration, 21
E
ENCRYPT function, 127
ESCAPE phrase, 34
EXCEPT, 244
EXISTS, sub-query, 33, 232, 237, 238
EXP function, 128
F
FETCH FIRST clause
Definition, 27
Efficient usage, 102
382
Definition, 282
DELETE usage, 52, 53
INSERT usage, 46, 47
MERGE usage, 60
TABLE function, 283
UPDATE usage, 50, 51, 285
G
GENERATE_UNIQUE function, 128, 326
GET DIAGNOSTICS statement, 65
GETHINT function, 130
Global Temporary Table, 278, 285
GROUP BY
CUBE, 199
Definition, 188
GROUPING SETS, 191
Join usage, 204
ORDER BY usage, 204
PARTITION comparison, 111
ROLLUP, 195
XMLAGG function, 171
Zero rows match, 358
GROUPING function, 85, 193
GROUPING SETS, 191
H
HASHEDVALUE function, 130
HAVING
Definition, 188
Sample queries, 190
Zero rows match, 358
HEX function, 130, 187, 338, 368
Hierarchy
Balanced, 299
Convergent, 298
Denormalizing, 307
Divergent, 297
Recursive, 298
Summary tables, 307
Triggers, 307
History tables, 313, 316
HOUR function, 131
I
Identity column
Definition, 210
ON and WHERE usage, 210
Outer followed by inner, 226
INPUT SEQUENCE, 54
INSERT
MAX
Function, 85
Rows, getting, 99
Values, getting, 97, 100
Median, 352
MERGE
Definition, 57
DELETE usage, 59
Full-select, 60
INSERT usage, 59
IPDATE usage, 59
Arithmetic, 35
Convert to character, 336
Function, 132
Truncation, 364
INTERSECT, 244
ITERATE statement, 66
J
Join
Definition, 132
History, 133
N
Nested table expression, 277
NEXTVAL expression, 267
Nickname, 19
No rows match, 358
Normalize data, 344
NOT EXISTS, sub-query, 236, 238
NOT IN, sub-query, 235, 238
NOT predicate, 32
NULLIF function, 139
Nulls
Labeled Duration, 21
LCASE function, 134
LEAVE statement, 67
LEFT function, 135
Left Outer Join, 211
LENGTH function, 135
LIKE predicate
CAST expression, 36
COUNT DISTINCT function, 83, 94
COUNT function, 238
Definition, 29
GROUP BY usage, 188
Join usage, 220
Order sequence, 185
Predicate usage, 35
RANK function, 94
Ranking, 94
Definition, 34
ESCAPE usage, 34
Varchar usage, 363
LN function, 135
LOCATE function, 135
LOG function, 136
LOG10 function, 136
Lousy Index. See Circular Reference
LTRIM function, 136, 347
M
Matching rows, zero, 358
Materialized Query Table
Index
O
OLAP functions
383
Graeme Birchall
P
Partition
Q
Quotes, 30
R
RAISE_ERROR function, 141
RAND function
Description, 141
Predicate usage, 360
Random row selection, 144
Reproducable usage, 142
Reproducible usage, 325
Random sampling. See TABLESAMPLE option
RANGE (AGGREGATION function), 110
RANK function
Definition, 92
Nulls processing, 94
ORDER BY, 93
Partition, 95
REAL function, 145
REC2XML function, 172
Recursion
384
Stopping, 300
Warning message, 296
When to use, 289
Recursive hierarchy
Definition, 298
Denormalizing, 308, 310
Triggers, 308, 310
Referential Integrity, 75, 76
Refresh age, 250
Refresh Deferred tables, 250
Refresh Immediate tables, 251
REGRESSION functions, 86
REPEAT function, 145, 339
REPLACE function, 145
Restart, Identity column, 263
RETURN statement, 178
Reversing values, 346
RIGHT function, 146
Right Outer Join, 213
ROLLUP, 195
ROUND function, 146
ROW_NUMBER function, 353
Definition, 98
ORDER BY, 98
PARTITION BY, 99
ROWS (AGGREGATION function), 107
RTRIM function, 147, 347
S
Sample data. See TABLESAMPLE option
Scalar function, user defined, 177
SELECT statement
Correlation name, 28
Definition, 25
DELETE usage, 56
DML changes, 54
Full-select, 284
INSERT usage, 47
Random row selection, 144
Syntax diagram, 26
UPDATE usage, 51
Semi-colon
Create, 267
Multi table usage, 269
NEXTVAL expression, 267
PREVVAL expression, 267
Sequence numbers. See Identity column
SIGN function, 147
SIGNAL statement
Definition, 67
Trigger usage, 78, 79
SIN function, 147
SMALLINT function, 148
SOME, sub-query, 230, 239
Sort string, 349
SOUNDEX function, 148
Sourced function, 175
SPACE function, 149
Special Registers, 23
SQLCACHE_SNAPSHOT function, 149
SQRT function, 150
Correlated, 236
DELETE usage, 52
Error prone, 230
EXISTS usage, 232, 237
IN usage, 235, 237
Multi-field, 237
Nested, 237
SUBSTR function
U
UCASE function, 157
Unbalanced hierarchy, 299
Uncorrelated sub-query, 236
Nested, 237
UNION
Definition, 244
Precedence Rules, 245
UNION ALL
Definition, 244
INSERT usage, 47, 48, 246
Recursion, 290
View usage, 246
UPDATE
CASE usage, 42
Definition, 49
Full-select, 50, 51, 285
MERGE usage, 59
Multiple tables usage, 246
OLAP functions, 50
Select results, 56
Definition, 151
SUBTRACT function. See MINUS function
SUM function, 87, 106
Summary tables
T
Table. See Create Table
Table function, 182
TABLE function, 283
TABLE_NAME function, 152
TABLE_SCHEMA function, 152
TABLESAMPLE option, 332
Temporary Table
Function, 153
Manipulation, 21
Time Series data, 330
TIMESTAMP
Index
V
VALUE function, 157
VALUES expression
Definition, 38
View usage, 39
VARCHAR function, 157
VARCHAR_FORMAT function, 158
VARIANCE function, 88
Versions (history tables), 316
View
Data in definition, 18
DDL example, 18, 19, 39
History tables, 315, 318
UNION ALL usage, 246
W
Wait. See PAUSE function
WEEK function, 158, 364
WEEK_ISO function, 158
WHERE vs. ON, joins, 209, 210, 212, 214
WHILE statement, 67
WITH statement
Defintion, 280
Insert usage, 282
MAX values, getting, 100
Multiple tables, 281
Recursion, 290
385
Graeme Birchall
VALUES expression, 38
X (hex) notation, 35
XML2CLOB function, 164
XMLAGG function, 165, 171
XMLATTRIBUTES function, 166, 170
XMLCONCAT function, 165
XMLELEMENT function, 165, 168
XMLFOREST function, 166
XMLNAMESPACES function, 167
386
Z
Zero divide (avoid), 42
Zero rows match, 358