0% found this document useful (0 votes)
46 views

SQL: Part I: Introduction To Databases Compsci 316 Fall 2014

This document covers an introduction to SQL and databases. It discusses creating and dropping tables, basic queries using SELECT, projections, joins, and renaming columns. It also covers the differences between set and bag semantics in SQL and how to force set semantics. Finally, it introduces table expressions and scalar subqueries to allow nesting of queries.

Uploaded by

omanfastsolution
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

SQL: Part I: Introduction To Databases Compsci 316 Fall 2014

This document covers an introduction to SQL and databases. It discusses creating and dropping tables, basic queries using SELECT, projections, joins, and renaming columns. It also covers the differences between set and bag semantics in SQL and how to force set semantics. Finally, it introduces table expressions and scalar subqueries to allow nesting of queries.

Uploaded by

omanfastsolution
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

SQL: Part I

Introduction to Databases
CompSci 316 Fall 2014
2

Announcements (Tue. Sep. 15)


• Homework #1 due today 11:59pm
• Ben will have his office hours today 6-8pm in Link
(instead of tomorrow)
• This week only, for Homework #1
• Homework #2 to be posted on website tonight
3

SQL
• SQL: Structured Query Language
• Pronounced “S-Q-L” or “sequel”
• The standard query language supported by most DBMS
• A brief history
• IBM System R
• ANSI SQL89
• ANSI SQL92 (SQL2)
• ANSI SQL99 (SQL3)
• ANSI SQL 2003 (added OLAP, XML, etc.)
• ANSI SQL 2006 (added more XML)
• ANSI SQL 2008, …
4

Creating and dropping tables


• _
_ _
• _
• Examples
!"
#$# % $
& $ # '" '""
( '"
$# (
$# & $ #
$#
)) * % $ )) $ $% $ +
)) ,- $ . # +
)) ,- $ + + +++& $ #+++ /
)) 0 $ +++& +++
5

Basic queries: SFW statement


•, 1
2 ( 1
34
• Also called an SPJ (select-project-join) query
• Corresponds to (but not really equivalent to)
relational algebra query:
, ,…, !" × × ⋯×
6

Example: reading a table

•, 5 2 (
• Single-table query, so no cross product here
• 34 clause is optional
• 5 is a short hand for “all columns”
7

Example: selection and projection


• Name of users under 18
•, 2 ( 34 6'7
• When was Lisa born?
•, 8"'9)
2 (
34 : ; ;
•, list can contain expressions
• Can also use built-in functions such as , , , ,, etc.
• String literals (case sensitive) are enclosed in single
quotes
8

Example: join
• ID’s and names of groups with a user whose name
contains “Simpson”
•, & $ #+ & $ #+
2 ( ( & $ #
34 + : ( +
< ( + : & $ #+
< + /= ;>, # $ >;
• /= matches a string against a pattern
• > matches any sequence of 0 or more characters
• Okay to omit _ in _ _
if _ is unique
9

Example: rename
• ID’s of all pairs of users that belong to one group
• Relational algebra query:
.& !, .& !
' ( ) ⋈ .+ !, .+ !-∧- .& !/ .& ! ' ( )
• SQL:
, '+ , ' 8+ , 8
2 ( ( , ' ( , 8
34 '+ : 8+
< '+ ? 8+
• , keyword is completely optional
10

A more complicated example


• Names of all groups that Lisa and Ralph are both in
, +
2 ( ' 8 ( ' ( 8 & $ #
34 '+ : ; ; < 8+ : ; # ;
< '+ : '+ < 8+ : 8+
< '+ : + < 8+ : +

Tip: Write the 2 ( clause first, then 34 , and


then ,
11

Why SFW statements?


• Out of many possible ways of structuring SQL
statements, why did the designers choose
, )2 ()34 ?
• A large number of queries can be written using only
selection, projection, and cross product (or join)
• Any query that uses only these operators can be written
in a canonical form: 0 1 × ⋯×
• Example: 2. ,3.4 ⋈1 5 ⋈1 6.7 18 9
= 2. ,3.4,6.7 1 ∧1 ∧18 × 5 × 9
•, )2 ()34 captures this canonical form
12

Set versus bag semantics


• Set
• No duplicates
• Relational model and algebra use set semantics
• Bag
• Duplicates allowed
• Number of duplicates is significant
• SQL uses bag semantics by default
13

Set versus bag example


gid
+ !( )
#
$
Member uid gid
'98 # 1
'8! $
7@A
gid
7@A $ ,
2 ( ( #
9@B
$
9@B $
1 1
$

$
1
14

A case for bag semantics


• Efficiency
• Saves time of eliminating duplicates
• Which one is more useful?
• ;+< => )
•, 2 (
• The first query just returns all possible user ages
• The second query returns the user age distribution
• Besides, SQL provides the option of set semantics
with /, /< keyword
15

Forcing set semantics


• ID’s of all pairs of users that belong to one group
•, '+ , ' 8+ , 8
2 ( ( , ' ( , 8
34 '+ : 8+
< '+ ? 8+
• Say Lisa and Ralph are in both the book club and the student
government
•, /, /< '+ , ' 8+
, 8 1
• With /, /< , all duplicate ( ', 8) pairs are removed
from the output
16

Semantics of SFW
•, C /, /< D ? ? 1 ?
2 ( 1
34
• For each in :
For each in : … …
For each in :
If is true over , , …, :
Compute and output ? , ? , …, ? as a row
If /, /< is present
Eliminate duplicate rows in output
• , , …, are often called tuple variables
17

SQL set and bag operations


• </ <, E , /< ,
• Set semantics
• Duplicates in input tables, if any, are first eliminated
• Duplicates in result are also eliminated (for </ <)
• Exactly like set ∪, −, and ∩ in relational algebra
• </ < , E , /< ,
• Bag semantics
• Think of each row as having an implicit count (the
number of times it appears in the table)
• Bag union: sum up the counts from two tables
• Bag difference: proper-subtract the two counts
• Bag intersection: take the minimum of the two counts
18

Examples of bag operations


' 8
fruit fruit
## ##
## $
$ $

, 5 2 ( ' , 5 2 ( ' , 5 2 ( '


</ < E /< ,
, 5 2 ( 8 , 5 2 ( 8 , 5 2 ( 8
fruit fruit fruit
## ## ##
## $
$
##
$
$
19

Examples of set versus bag operations


Poke (uid1, uid2, timestamp)
• , ' 2 ( $F
E
, 8 2 ( $F
• Users who poked others but never got poked by others
• , ' 2 ( $F
E
, 8 2 ( $F
• Users who poked others more than others poke them
20

SQL features covered so far

•, )2 ()34 statements (select-project-


join queries)
• Set and bag operations

Next: how to nest SQL queries


21

Table expression
• Use query result as a table
• In set and bag operations, 2 ( clauses, etc.
• A way to “nest” queries
• Example: names of users who poked others more
than others poked them
•, /, /<
2 (
, ' , 2 ( $F
E
, 8 , 2 ( $F
,
34 + : +
22

Scalar subqueries
• A query that returns a single row can be used as a
value in 34 ,, , etc.
• Example: users at the same age as Bart
•, 5
2 ( What’s Bart’s age?
34 : ,
2 (
34 : ; ;
• Runtime error if subquery returns more than one row
• Under what condition will this error never occur?
• What if the subquery returns no rows?
• The answer is treated as a special value < , and the
comparison with < will fail
23

/< subqueries
• C /< > D ) checks if C is in the result of
> D )
• Example: users at the same age as (some) Bart
•, 5
2 ( What’s Bart’s age?
34 /< ,
2 (
34 : ; ;
24

E/, , subqueries
• E/, , > D ) checks if the result of
> D ) is non-empty
• Example: users at the same age as (some) Bart
•, 5
2 ( ,
34 E/, , , 5 2 (
34 : ; ;
< : +
• This happens to be a correlated subquery—a subquery
that references tuple variables in surrounding queries
25

Semantics of subqueries
•, 5
2 ( ,
34 E/, , , 5 2 (
34 : ; ;
< : +

• For each row in


• Evaluate the subquery with the value of +
• If the result of the subquery is not empty, output +5
• The DBMS query optimizer may choose to process
the query in an equivalent, but more efficient way
(example?)
26

Scoping rule of subqueries


• To find out which table a column belongs to
• Start with the immediately surrounding query
• If not found, look in the one surrounding that; repeat if
necessary
• Use _ _ notation and ,
(renaming) to avoid confusion
27

Another example
•, 5 2 (
34 E/, ,
, 5 2 ( (
34 : +
< E/, ,
, 5 2 ( (
34 : +
< 6? +
• Users who join at least two groups
28

Quantified subqueries
• A quantified subquery can be used syntactically as a
value in a 34 condition
• Universal quantification (for all):
1 34 C > D ) 1
• True iff for all in the result of > D ) , C- -
• Existential quantification (exists):
1 34 C <G > D ) 1
• True iff there exists some in > D ) result such that
C- -
Beware
• In common parlance, “any” and “all” seem to be synonyms
• In SQL, <G really means “some”
29

Examples of quantified subqueries


• Which users are the most popular?

•, 5
2 (
34 #$# ?: , #$# 2 (

•, 5
2 (
34 <
#$# 6 <G , #$# 2 (
Use < to negate a condition
30

More ways to get the most popular


• Which users are the most popular?

•, 5
2 ( ,
34 < E/, ,
, 5 2 (
34 #$# ? +#$#

•, 5 2 (
34 < /<
, '+
2 ( , ' , 8
34 '+#$# 6 8+#$#
31

SQL features covered so far


•, )2 ()34 statements
• Set and bag operations
• Table expressions, subqueries
• Subqueries allow queries to be written in more
declarative ways (recall the “most popular” query)
• But in many cases they don’t add expressive power
• Try translating other forms of subqueries into [< ] E/, ,,
which in turn can be translated into join (and difference)
• Watch out for number of duplicates though

Next: aggregation and grouping


32

Aggregates
• Standard SQL aggregate functions: < , , (,
H&, (/<, ( E
• Example: number of users under 18, and their
average popularity
•, < 5 H& #$#
2 (
34 6 '7
• < 5 counts the number of rows
33

Aggregates with /, /<


• Example: How many users are in some group?

•, < /, /<
2 ( (
is equivalent to:
•, < 5
2 ( , /, /< 2 ( (
34

Grouping
•, 1 2 ( 1 34 1
& G > _ E_ >

• Example: compute average popularity for


each age group
•, H& #$#
2 (
& G
35

Semantics of & G
, 1 2 ( 1 34 1 & G 1
• Compute 2 ( (×)
• Compute 34 ( )
• Compute & G: group rows according to the
values of & G columns
• Compute , for each group ( )
• For aggregation functions with /, /< inputs, first
eliminate duplicates within the group
Number of groups =
number of rows in the final output
36

Example of computing & G


, H& #$# 2 ( & G
uid name age pop
'98 '" "+I
Compute & G: group
7@A 7 "+A rows according to the values
'8! ( $ '" "+8 of & G columns
9@B # 7 "+!
uid name age pop
'98 '" "+I
Compute , '8! ( $ '" "+8
for each group 7@A 7 "+A
9@B # 7 "+!
age avg_pop
'" "+@@
7 "+@"
37

Aggregates with no & G


• An aggregate query with no & G clause =
all rows go into one group
, H& #$# 2 (

Group all rows Aggregate over


into one group the whole group
uid name age pop uid name age pop
'98 '" "+I '98 '" "+I avg_pop
7@A 7 "+A 7@A 7 "+A "+@8@
'8! ( $ '" "+8 '8! ( $ '" "+8
9@B # 7 "+! 9@B # 7 "+!
38

Restriction on ,
• If a query uses aggregation/group by, then every
column referenced in , must be either
• Aggregated, or
• A& G column
This restriction ensures that any ,
expression produces only one value for each group
39

Examples of invalid queries


•,
2 ( & G
• Recall there is one output row per group
• There can be multiple uid values per group
•, ( E #$# 2 (
• Recall there is only one group for an aggregate query
with no & G clause
• There can be multiple uid values
• Wishful thinking (that the output uid value is the one
associated with the highest popularity) does NOT work
Another way of writing the “most popular” query?
40

4 H/<&
• Used to filter groups based on the group properties
(e.g., aggregate values, & G column values)
•, 1 2 ( 1 34 1 & G 1
4 H/<&
• Compute 2 ( (×)
• Compute 34 ( )
• Compute & G: group rows according to the values
of & G columns
• Compute 4 H/<& (another over the groups)
• Compute , ( ) for each group that passes
4 H/<&
41

4 H/<& examples
• List the average popularity for each age group with
more than a hundred users
•, H& #$#
2 (
& G
4 H/<& < 5 ? '""
• Can be written using 34 and table expressions
• Find average popularity for each age group over 10
•, H& #$#
2 (
& G
4 H/<& ? '"
• Can be written using 34 without table expressions
42

SQL features covered so far


•, )2 ()34 statements
• Set and bag operations
• Table expressions, subqueries
• Aggregation and grouping
• More expressive power than relational algebra

Next: ordering output rows


43

G
•, C /, /< D 1
2 ( 1 34 1 & G 1 4 H/<& 1
G _ C , J , D 1
• , = ascending, , = descending
• Semantics: After , list has been computed
and optional duplicate elimination has been carried
out, sort the output according to G
specification
44

G example
• List all users, sort them by popularity (descending)
and name (ascending)
•, #$#
2 (
G #$# ,
• , is the default option
• Strictly speaking, only output columns can appear in
G clause (although some DBMS support more)
• Can use sequence numbers instead of names to refer to
output columns: G 9 , 8
45

SQL features covered so far


•, )2 ()34 statements
• Set and bag operations
• Table expressions, subqueries
• Aggregation and grouping
• Ordering

Next: < ’s, outerjoins, data modification,


constraints, …

You might also like