SQL: Part I: Introduction To Databases Compsci 316 Fall 2014
SQL: Part I: Introduction To Databases Compsci 316 Fall 2014
Introduction to Databases
CompSci 316 Fall 2014
2
SQL
• SQL: Structured Query Language
• Pronounced “S-Q-L” or “sequel”
• The standard query language supported by most DBMS
• A brief history
• IBM System R
• ANSI SQL89
• ANSI SQL92 (SQL2)
• ANSI SQL99 (SQL3)
• ANSI SQL 2003 (added OLAP, XML, etc.)
• ANSI SQL 2006 (added more XML)
• ANSI SQL 2008, …
4
•, 5 2 (
• Single-table query, so no cross product here
• 34 clause is optional
• 5 is a short hand for “all columns”
7
Example: join
• ID’s and names of groups with a user whose name
contains “Simpson”
•, & $ #+ & $ #+
2 ( ( & $ #
34 + : ( +
< ( + : & $ #+
< + /= ;>, # $ >;
• /= matches a string against a pattern
• > matches any sequence of 0 or more characters
• Okay to omit _ in _ _
if _ is unique
9
Example: rename
• ID’s of all pairs of users that belong to one group
• Relational algebra query:
.& !, .& !
' ( ) ⋈ .+ !, .+ !-∧- .& !/ .& ! ' ( )
• SQL:
, '+ , ' 8+ , 8
2 ( ( , ' ( , 8
34 '+ : 8+
< '+ ? 8+
• , keyword is completely optional
10
$
1
14
Semantics of SFW
•, C /, /< D ? ? 1 ?
2 ( 1
34
• For each in :
For each in : … …
For each in :
If is true over , , …, :
Compute and output ? , ? , …, ? as a row
If /, /< is present
Eliminate duplicate rows in output
• , , …, are often called tuple variables
17
Table expression
• Use query result as a table
• In set and bag operations, 2 ( clauses, etc.
• A way to “nest” queries
• Example: names of users who poked others more
than others poked them
•, /, /<
2 (
, ' , 2 ( $F
E
, 8 , 2 ( $F
,
34 + : +
22
Scalar subqueries
• A query that returns a single row can be used as a
value in 34 ,, , etc.
• Example: users at the same age as Bart
•, 5
2 ( What’s Bart’s age?
34 : ,
2 (
34 : ; ;
• Runtime error if subquery returns more than one row
• Under what condition will this error never occur?
• What if the subquery returns no rows?
• The answer is treated as a special value < , and the
comparison with < will fail
23
/< subqueries
• C /< > D ) checks if C is in the result of
> D )
• Example: users at the same age as (some) Bart
•, 5
2 ( What’s Bart’s age?
34 /< ,
2 (
34 : ; ;
24
E/, , subqueries
• E/, , > D ) checks if the result of
> D ) is non-empty
• Example: users at the same age as (some) Bart
•, 5
2 ( ,
34 E/, , , 5 2 (
34 : ; ;
< : +
• This happens to be a correlated subquery—a subquery
that references tuple variables in surrounding queries
25
Semantics of subqueries
•, 5
2 ( ,
34 E/, , , 5 2 (
34 : ; ;
< : +
Another example
•, 5 2 (
34 E/, ,
, 5 2 ( (
34 : +
< E/, ,
, 5 2 ( (
34 : +
< 6? +
• Users who join at least two groups
28
Quantified subqueries
• A quantified subquery can be used syntactically as a
value in a 34 condition
• Universal quantification (for all):
1 34 C > D ) 1
• True iff for all in the result of > D ) , C- -
• Existential quantification (exists):
1 34 C <G > D ) 1
• True iff there exists some in > D ) result such that
C- -
Beware
• In common parlance, “any” and “all” seem to be synonyms
• In SQL, <G really means “some”
29
•, 5
2 (
34 #$# ?: , #$# 2 (
•, 5
2 (
34 <
#$# 6 <G , #$# 2 (
Use < to negate a condition
30
•, 5
2 ( ,
34 < E/, ,
, 5 2 (
34 #$# ? +#$#
•, 5 2 (
34 < /<
, '+
2 ( , ' , 8
34 '+#$# 6 8+#$#
31
Aggregates
• Standard SQL aggregate functions: < , , (,
H&, (/<, ( E
• Example: number of users under 18, and their
average popularity
•, < 5 H& #$#
2 (
34 6 '7
• < 5 counts the number of rows
33
•, < /, /<
2 ( (
is equivalent to:
•, < 5
2 ( , /, /< 2 ( (
34
Grouping
•, 1 2 ( 1 34 1
& G > _ E_ >
Semantics of & G
, 1 2 ( 1 34 1 & G 1
• Compute 2 ( (×)
• Compute 34 ( )
• Compute & G: group rows according to the
values of & G columns
• Compute , for each group ( )
• For aggregation functions with /, /< inputs, first
eliminate duplicates within the group
Number of groups =
number of rows in the final output
36
Restriction on ,
• If a query uses aggregation/group by, then every
column referenced in , must be either
• Aggregated, or
• A& G column
This restriction ensures that any ,
expression produces only one value for each group
39
4 H/<&
• Used to filter groups based on the group properties
(e.g., aggregate values, & G column values)
•, 1 2 ( 1 34 1 & G 1
4 H/<&
• Compute 2 ( (×)
• Compute 34 ( )
• Compute & G: group rows according to the values
of & G columns
• Compute 4 H/<& (another over the groups)
• Compute , ( ) for each group that passes
4 H/<&
41
4 H/<& examples
• List the average popularity for each age group with
more than a hundred users
•, H& #$#
2 (
& G
4 H/<& < 5 ? '""
• Can be written using 34 and table expressions
• Find average popularity for each age group over 10
•, H& #$#
2 (
& G
4 H/<& ? '"
• Can be written using 34 without table expressions
42
G
•, C /, /< D 1
2 ( 1 34 1 & G 1 4 H/<& 1
G _ C , J , D 1
• , = ascending, , = descending
• Semantics: After , list has been computed
and optional duplicate elimination has been carried
out, sort the output according to G
specification
44
G example
• List all users, sort them by popularity (descending)
and name (ascending)
•, #$#
2 (
G #$# ,
• , is the default option
• Strictly speaking, only output columns can appear in
G clause (although some DBMS support more)
• Can use sequence numbers instead of names to refer to
output columns: G 9 , 8
45