SQL Advanced Queries
SQL Advanced Queries
• Incomplete information
◦ dealing with null values
◦ Outerjoins
• Recursion in SQL99
• Interaction between SQL and a programming language
◦ embedded SQL
◦ dynamic SQL
“... those SQL features are not fully consistent; indeed, in some
ways they are fundamentally at odds with the way the world be-
haves.”
“Use [nulls] properly and they work for you, but abuse them,
and they can ruin everything”
The following tables are all in P OSS(T1) is the set of all relations it can
represent. Examples:
Title Director Actor Title Director Actor
Dr. Strangelove Kubrick Sellers Dr. Strangelove Kubrick Sellers
Dr. Strangelove Kubrick Scott Dr. Strangelove Kubrick Scott
Star Wars Polanski Nicholson Titanic Polanski Nicholson
Titanic Polanski Huston Titanic Polanski Huston
Star Wars Lucas Ford Star Wars Lucas Ford
Frantic Cameron Ford Frantic Lucas Ford
Title Director Actor
Dr. Strangelove Kubrick Sellers
Dr. Strangelove Kubrick Scott
Chinatown Polanski Nicholson
Chinatown Polanski Huston
Star Wars Lucas Ford
Frantic Polanski Ford
R3 Q(R3)
T
... ... T’
. .
T’ = Q(T)
• Question: can we always find Q(T ) for every T , and every Q in a query
language?
• Bad news: This is not possible even for very simple relational algebra
queries.
A B
Table: T = 0 1
x 2
Query: Q = σA=3(T )
• Idea #2: extend Codd tables by adding constraints on null values (e.g.,
some must be equal, some cannot be equal, etc).
• Combining these ideas makes it possible to evaluate queries with in-
complete information.
• However, evaluation algorithms, and – more importantly – query results,
are often completely incomprehensible.
• SQL approach: there is a single general purpose NULL for all cases of
missing/inapplicable information
• Nulls occur as entries in tables; sometimes they are displayed as null,
sometimes as ’–’
• They immediately lead to comparison problems
• The union of
SELECT * FROM R WHERE R.A=1 and
SELECT * FROM R WHERE R.A<>1 should be the same as
SELECT * FROM R.
• But it is not.
• Because, if R.A is null, then neither R.A=1 nor R.A<>1 evaluates to
true.
returns 2
returns 1
•
Missile Intercept
# Target
I# Missile Status
M1 A
I1 M1 active
M2 B
I2 null active
M3 C
• {A, B, C} are in USCities
• The query returns the empty set:
M2 NOT IN {M1, null} and M3 NOT IN {M1, null}
evaluate to unknown.
• although either M2 or M3 is not being intercepted!
• Highly unlikely? Probably (and hopefully). But never forget what
caused the Mars Climate Orbiter to crash!
• Example:
Studio Film
Name Title Title Gross
’United’ ’Licence to kill’ ’Licence to kill’ 156
’United’ ’Rain man’ ’Rain man’ 412
’Dreamworks’ ’Gladiator’ ’Fargo’ 25
• Query: for each studio, find the total gross of its movies:
SELECT Studio.Name, SUM(Film.Gross)
FROM Studio NATURAL JOIN Film
GROUP BY Studio.Name
• Answer: (’United’, 568)
• But often we want (’Dreamworks’, ...) as well!
• ’Dreamworks’ is lost because ’Gladiator’ doesn’t match anything in
Film.
Database Systems 23 L. Libkin
Outerjoins
• Warning: if you use outerjoins in aggregate queries, you get null values
as results of all aggregates except COUNT
CREATE VIEW V (B1, B2) AS
SELECT Studio.Name, Film.Gross
FROM Studio NATURAL LEFT OUTER JOIN Film
SELECT * FROM V
B1 B2
’Dreamworks’ null
Returns
’United’ 156
’United’ 412
• SELECT B1, SUM(B2) FROM V GROUP BY B1
returns (’Dreamworks, null), (’United’, 568).
• SELECT B1, COUNT(B2) FROM V GROUP BY B1
returns (’Dreamworks, 0), (’United’, 2).
Database Systems 27 L. Libkin
Outerjoins cont’d
• Some systems don’t like the keyword NATURAL and would only let you
do
R NATURAL LEFT/RIGHT/FULL OUTER JOIN S
ON condition
• Example:
SELECT *
FROM Studio LEFT OUTER JOIN Film ON
Studio.Title=Film.Title
• Result:
Name Title Title Gross
’United’ ’Licence to kill’ ’Licence to kill’ 156
’United’ ’Rain man’ ’Rain man’ 412
’Dreamworks’ ’Gladiator’ null null
• Reachability queries:
Flights Src Dest
’EDI’ ’LHR’
’EDI’ ’EWR’
’EWR’ ’LAX’
··· ···
• Query: Find pairs of cities (A, B) such that one can fly from A to B
with at most one stop:
• Query: Find pairs of cities (A, B) such that one can fly from A to B
with at most two stops:
reach(x, y) :– flights(x, y)
reach(x, y) :– flights(x, z), reach(z, y)
• One of these rules is recursive: reach refers to itself.
• Evaluation:
- Step 0: reach 0 is initialized as the empty set.
- Step i + 1: Compute
reach i+1(x, y) :– flights(x, y)
reach i+1(x, y) :– flights(x, z), reach i(z, y)
- Stop condition: If reach i+1 = reach i, then it is the answer to the
query.
• Example: assume that flights contains (a, b), (b, c), (c, d).
• Step 0: reach = ∅
• Step 1: reach becomes {(a, b), (b, c), (c, d)}.
• Step 2: reach becomes {(a, b), (b, c), (c, d), (a, c), (b, d)}.
• Step 3: reach becomes {(a, b), (b, c), (c, d), (a, c), (b, d), (a, d)}.
• Step 4: one attempts to use the rules, but infers no new values for
reach. The final answer is thus:
{(a, b), (b, c), (c, d), (a, c), (b, d), (a, d)}
• Problematic recursion:
WITH RECURSIVE R(A) AS
(SELECT S.A
FROM S
WHERE S.A NOT IN
SELECT R.A FROM R)
SELECT * FROM R
• Formulated as a rule:
r(x) :– s(x), ¬r(x)
• SQL is good for querying, but not so good for complex computational
tasks. It is not very good for displaying results nicely.
• Moreover, queries and updates typically occur inside complex compu-
tations, for which SQL is not a suitable language.
• Thus, one most often runs SQL queries from host programming lan-
guages, and then processes the results.
• One approach: extend SQL.
SQL3 can do many queries that SQL2 couldn’t do. But sometimes one
still needs to do some operations in a programming language.
• SQL offers two flavors of communicating with a PL:
embedded SQL,
dynamic SQL.
• Basic rule: if you know SQL, and you know the programming language,
then you know embedded/dynamic SQL.
Database Systems 40 L. Libkin
SQL and programming languages cont’d
• DBMS tells the host language what the state of the database is via a
special variable called SQLSTATE
• In C, it is commonly declared as char SQLSTATE[6].
• Two most important values:
’00000’ means “no error”
’02000’ means: “requested tuple not found’.
The latter is used to break loops.
• Why 6 characters? Because in C we commonly use the function strcmp
to compare strings, which expects the last symbol to be ’\0’. Thus
we declare char SQLSTATE[6] and initially set the 6th character to
’\0’.
• To declare variables shared between C and SQL, one puts them between
EXEC SQL BEGIN DECLARE SECTION
and
EXEC SQL END DECLARE SECTION
• Each variable var declared in this section will be referred to as :var
in SQL queries.
• Example:
EXEC SQL BEGIN DECLARE SECTION;
char title[20], theater[20];
int showtime;
char SQLSTATE[6];
EXEC SQL END DECLARE SECTION
• With these declarations, we can write a program that prompts the user
for title, theater, and showtime, and inserts a tuple into Schedule.
• void InsertIntoSchedule() {
• Task: prompt the user for theater and showtime, and return the title
(if it can be found), but only if it is a movie directed by Spielberg.
• void FindTitle() {
if (strcmp(SQLSTATE,"02000") != 0)
printf("title = %s\n", tl)
else printf("no title found\n");
return m_count;
}
• Single-tuple insertions or selections are rare when one deals with DBMSs:
SQL is designed to operate with tables.
• However, programming languages operate with variables, not tables.
• Mechanism to connect them: cursors.
• Cursor allows a program to access a table, one row at a time.
• A cursor can be declared for a table in the database, or the result of a
query.
• Variables from a program can be used in queries for which cursors are
declared if they are preceded by a colon.
• Open cursor:
EXEC SQL OPEN C_movies;
• Fetch – retrieves the value of the current tuple and and assigns fields
to variables from the host language.
• Syntax:
EXEC SQL FETCH <cursor> INTO <variables>
• Examples:
EXEC SQL FETCH C_th_dir INTO :th;
fetches the current value of theater to which the cursor C_th_dir
points, puts the value in th, and moves the cursor to the next position.
• If there are multiple fields:
EXEC SQL FETCH C_Movies INTO :tl, :dir, :act, :length;
Fetches the current (tl, dir, act, length) tuple from Movies,
and moves to the next tuple.
void FindTheaters() {
int i;
EXEC SQL BEGIN DECLARE SECTION;
char dir[20], th[20], SQLSTATE[6];
EXEC SQL END DECLARE SECTION;
i=0;
while (i < 5) {
EXEC SQL FETCH C_th_dir into :th;
if (NO_MORE_TUPLES) break;
else printf("theater\t%s\n", th);
++i;
}
void ChangeTime() {
EXEC SQL BEGIN DECLARE SECTION;
char tl[20], dir[20], act[20], SQLSTATE[6];
float length;
EXEC SQL END DECLARE SECTION;
EXEC SQL DECLARE C_Movies CURSOR FOR Movies;
Database Systems 55 L. Libkin
Using cursors for updates cont’d
• Connecting to a database:
strcpy(db_name, "my-database");
EXEC SQL CONNECT TO :db_name;
• If user names and passwords are required, use:
EXEC SQL CONNECT TO :db_name USER :userid USING :passwd;
• Disconnecting:
EXEC SQL CONNECT RESET;
• Save all changes made by the program:
EXEC SQL COMMIT;
• Rollback (for unsuccessful termination):
EXEC SQL ROLLBACK;
• So far we assumed that there is only one user. In reality this is not
true.
• While a cursor is open on a table, some other user could modify that
table. This could lead to problems.
• One way of addressing this: insensitive cursors.
EXEC SQL DECLARE C1
INSENSITIVE CURSOR FOR
SELECT Title, Director
FROM Movies
• This guarantees that if someone modifies Movies while C1 is open, it
won’t affect the set of fetched tuples.
• This is a very expensive solution and is not used very often.
void Transfer() {
/* we ask the user to enter
acct_from, acct_to, amount */
• Assume that acct_to and acct_from are joint accounts, with two
people being authorized to do transfers.
• Let acct_from have $1000. Suppose both users try to transfer $1000
from this account.
• Sequence of events:
◦ User 1 initiates a transfer. Condition is checked and the first UPDATE
statement is executed.
◦ User 2 initiates a transfer. Condition is checked, and met, since
the second UPDATE statement from the first transfer hasn’t been
executed yet. Now both UPDATE statements are executed.
◦ User 1’s transfer operation is finished.
• acct_from has balance −$1000, despite an apparent safeguard against
a situation like this.
• If the test (balance1 < amount) is true, we may prefer to abort the
transaction:
if (balance1 < amount) {
printf("Insufficient amount in account %d\n", acct_from);
EXEC SQL ROLLBACK;
}
• If there is sufficient amount of funds, we can put after UPDATE state-
ments
EXEC SQL COMMIT;
to indicate successful completion.