S4 - RM Ra SQL
S4 - RM Ra SQL
2
RELATIONAL
MODEL
3
Why
Study
the
Relational
Model?
Widely
used
Vendors:
IBM,
Microsoft,
Oracle,
etc.
Relational
Database:
Definitions
Relational database: a
set
of
relations.
Relation: made
up
of
2
parts:
Schema : specifies name
of
relation,
plus
name
and
type
of
each
column.
E.g.
Students(sid:
string,
name:
string,
login:
string,
age:
integer,
gpa:
real)
Enrolled Students
sid cid grade
sid name login age gpa
53666 15-101 C
53666 18-203 B 53666 Jones jones@cs 18 3.4
53650 15-112 A 53688 Smith smith@cs 18 3.2
53666 15-105 B 53650 Smith smith@math 19 3.8
Enrolled Students
sid cid grade
sid name login age gpa
53666 15-101 C
53666 18-203 B 53666 Jones jones@cs 18 3.4
53650 15-112 A 53688 Smith smith@cs 18 3.2
53666 15-105 B 53650 Smith smith@math 19 3.8
RELATIONAL
ALGEBRA
10
Relational
Algebra
Relational
Query
Languages
Selection
&
Projection
Union,
Set
Difference
&
Intersection
Cross
product
&
Joins
Examples
11
Relational
Query
Languages
Query
languages:
Allow
manipulation
and
retrieval
of
data
from
a
database.
Relational
model
supports
simple,
powerful
QLs:
Strong
formal
foundation
based
on
logic.
Allows
for
much
optimization.
Query
Languages
!=
programming
languages!
QLs
not
expected
to
be
“Turing
complete”.
QLs
not
intended
to
be
used
for
complex
calculations.
QLs
support
easy,
efficient
access
to
large
data
sets.
Relational
Algebra
Preliminaries
A
query
is
applied
to
relation
instances,
and
the
result
of
a
query
is
also
a
relation
instance.
Schemas of
input
relations
for
a
query
are
fixed (but
query
will
run
over
any
legal
instance)
The
schema
for
the
result of
a
given
query
is
also
fixed.
It
is
determined
by
the
definitions
of
the
query
language
constructs.
Positional
vs.
named-‐field
notation:
Positional
notation
easier
for
formal
definitions,
named-‐field
notation
more
readable.
Both
used
in
SQL
Relational
Algebra:
5
Basic
Operations
Selection (σ)
Selects
a
subset
of
rows from
relation
(horizontal).
Projection (π)
Retains
only
wanted
columns from
relation
(vertical).
Cross-‐product (x)
Allows
us
to
combine
two
relations.
Set-‐difference (–)
Tuples
in
R1,
but
not
in
R2.
16
Projection
(π)
Examples: πage(S2) πsname,
rating(S2)
Retains
only
attributes
that
are
in
the
“projection
list”.
Schema of
result:
exactly
the
fields
in
the
projection
list,
with
the
same
names
that
they
had
in
the
input
relation.
Projection
operator
has
to
eliminate
duplicates (How
do
they
arise?
Why
remove
them?)
Note:
real
systems
typically
don’t
do
duplicate
elimination
unless
the
user
explicitly
asks
for
it.
(Why
not?)
Projection
(π) sname
yuppy
rating
9
lubber 8
guppy 5
rusty 10
sid sname rating age
πsname,
rating(S2)
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Projection
(π) sname
yuppy
rating
9
lubber 8
guppy 5
rusty 10
sid sname rating age
πsname,
rating(S2)
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0 age
58 rusty 10 35.0
35.0
S2
55.5
πage(S2)
Selection
(s)
Selects
rows
that
satisfy
selection
condition.
Result
is
a
relation.
Schema of
result
is
same
as
that
of
the
input
relation.
Do
we
need
to
do
duplicate
elimination?
22
Union
and
Set-‐Difference
28
Cross-‐Product
S1
x
R1:
Each
row
of
S1
paired
with
each
row
of
R1.
Q:
How
many
rows
in
the
result?
Result
schema
has
one
field
per
field
of
S1
and
R1,
with
field
names
“inherited”
if
possible.
May
have
a
naming
conflict:
Both
S1
and
R1
have
a
field
with
the
same
name.
In
this
case,
can
use
the
renaming
operator:
S1
R1
=
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
Natural
Join
Example
1 (sid) sname rating age (sid) bid day
S1
X
R1
= 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Natural
Join
Example
1 (sid) sname rating age (sid) bid day
S1
X
R1
= 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
s
2
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
Natural
Join
Example
1 (sid) sname rating age (sid) bid day
S1
x
R1
= 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
s
2
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
p 3
37
sid bid day
Examples Reserves
22 101 10/10/96
58 103 11/12/96
39
Answers
1.
Find
names
of
sailors
who
have
reserved
boat
#103
Can
identify
all
red
or
green
boats,
then
find
sailors
who have
reserved
one
of
these
boats:
ρ(Tempboats,(σcolor=‘red’ ∨ color=‘green’Boats))
ρ(Tempred,πsid((σcolor=‘red’Boats) ⋈ Reserves))
ρ(Tempgreen,πsid((σcolor=‘green’Boats) ⋈ Reserves))
π sname(σ (Sailors))
rating>9
Answers
…
2. Find
all
sailors
who
reserved
a
boat
prior
to
November
1,
1996
π sname(Sailors σ (Reserves))
day<'11/1/96'
Answers
…
3. Find
(the
names
of)
all
boats
that
have
been
reserved
at
least
once
π (Boats Reserves)
bname
Answers
…
4.
Find
all
pairs
of
sailors
with
the
same
rating
π (S1 S2)
sname1,sname2 rating1=rating2∧sid1≠sid2
Answers
…
5. Find
all
pairs
of
sailors
in
which
the
older
sailor
has
a
lower rating
π (S1 S2)
sname1,sname2 age1>age2∧rating1<rating2
Relational
Algebra
Relational
Query
Languages
Selection
&
Projection
Union,
Set
Difference
&
Intersection
Cross
product
&
Joins
Examples
Division
(additional
material)
50
Last Compound
Operator:
Division
Useful
for
expressing
“for
all”
queries
like:
Find
sids of
sailors
who
have
reserved
all boats.
For
A/B
attributes
of
B
are
subset
of
attrs of
A.
May
need
to
“project”
to
make
this
happen.
E.g.,
let
A have
2
fields,
x
and
y
;
B have
only
field
y
:
A B = { x ∀ y ∈ B(∃ x, y ∈ A)}
A/B
contains
all
x tuples
such
that
for
every y tuple
in
B,
there
is
an
xy tuple
in
A.
Examples
of
Division
A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4
s1 p3
B1 p2
s1 p4
B2 p4
s2 p1 sno B3
s2 p2 s1
s3 p2 s2
s4 p2 s3
s4 p4 s4
A A/B1
Examples
of
Division
A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4
s1 p3
B1 p2
s1 p4
B2 p4
s2 p1 sno B3
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1
s4 p4 s4 s4
A A/B1 A/B2
Examples
of
Division
A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4
s1 p3
B1 p2
s1 p4
B2 p4
s2 p1 sno B3
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1 sno
s4 p4 s4 s4 s1
A A/B1 A/B2 A/B3
Expressing
A/B
Using
Basic
Operators
Division
is
not
essential
op;
just
a
useful
shorthand.
(Also
true
of
joins,
but
joins
are
so
common
that
systems
implement
joins
specially.)
Idea: For
A/B,
compute
all
x values
that
are
not
“disqualified”
by
some
y value
in
B.
x value
is
disqualified if
by
attaching
y
value
from
B,
we
obtain
a
xy tuple
that
is
not
in
A.
B
s2 p2 s3 p1
s3 p2 s3 p2
s4 p2 s3 p4
s4 p4 s4 p1
s4 p2
A
d
s4 p4
T1=π sno (A)×B
Expressing
A/B: π sno (A)−π sno ((π sno (A)×B)−A)
sno pno sno pno
sno pno s1 p1 s2 p4
s1 p1 s1 p2 s3 p1 pno
s1 p2 s1 p4 s3 p4 p1
s1 p3 s2 p1 s4 p1 p2
s1 p4 s2 p2
d
p4
T1−A
s2 p1 s2 p4
s2 p2
B
s3 p1
s3 p2 s3 p2 sno
s4 p2 s3 p4 s2
s4 p4 s4 p1 s3
s4 p2
A s4
s4 p4 d
p4
T1−A
s2 p1 s2 p4
s2 p2
B
s3 p1 sno
s3 p2 s3 p2 s1 sno sno
s4 p2 s3 p4 s2
s4 p4 − s2 = s1
s4 p1 s3 s3
s4 p2 s4
A/B=
A s4 π sno (A)−T2
s4 p4 d
60
Moving
on
to
SQL
Database
Management
Systems
(DBMS)
store
and
manage
large
quantities
of
data
We
want
an
intuitive
way
to
ask
questions
(queries)
You
have
been
taught
procedural
languages
(C,
java)
which
specify
how
to
solve
a
problem
(or
answer
a
question)
Now,
we
talk
about
SQL
SQL
is
a
declarative
query
language
We
ask
what
we
want
and
the
DBMS
is
going
to
deliver!
SQL
-‐ A
language
for
Relational
DBs
SQL* (a.k.a.
“Sequel”),
standard
language
Data
Definition
Language
(DDL)
create,
modify,
delete
relations
specify
constraints
administer
users,
security,
etc.
Data
Manipulation
Language
(DML)
Specify
queries
to
find
tuples
that
satisfy
criteria
add,
modify,
remove
tuples
The
PK
of
a
relation
is
the
column
(or
the
group
of
columns)
that
can
uniquely
define
a
row.
In
other
words:
Two
rows
cannot have
the
same
PK.
SQL
Overview
CREATE TABLE <name> ( <field> <domain>, … )
UPDATE <name>
SET <field name> = <value>
WHERE <condition>
SELECT <fields>
FROM <name>
WHERE <condition>
GROUP BY <fields>
HAVING <condition>
ORDER BY <fields>
Creating
Relations
in
SQL
Creates
the
Students
relation.
Note:
the
type
(domain)
of
each
field
is
specified,
and
enforced
by
the
DBMS
whenever
tuples
are
added
or
modified.
CREATE TABLE Students
(sid CHAR(20),
name CHAR(20),
login CHAR(10),
age INTEGER,
gpa FLOAT)
Table
Creation
(continued)
Another
example:
the
Enrolled
table
holds
information
about
courses
students
take.
“Students
can
take
only
one
course,
and
no
two
students
in
a
course
receive
the
same
grade.”
Foreign
Keys,
Referential
Integrity
Foreign
key :
Set
of
fields
in
one
relation
that
is
used
to
“refer”
to
a
tuple
in
another
relation.
Must
correspond
to
the
primary
key
of
the
other
relation.
Like
a
“logical
pointer”.
Enrolled
sid cid grade Students
53666 15-101 C sid name login age gpa
53666 18-203 B 53666 Jones jones@cs 18 3.4
53650 15-112 A 53688 Smith smith@cs 18 3.2
53666 15-105 B 53650 Smith smith@math 19 3.8
Adding
and
Deleting
Tuples
Can
insert
a
single
tuple
using:
INSERT INTO Students (sid, name, login, age, gpa)
VALUES (‘53688’, ‘Smith’, ‘smith@cs’, 18, 3.2)
name login
SELECT S.name, S.login
FROM Students S Jones jones@cs
Smith smith@ee
White white@cs
S.name E.cid
result =
Jones History105
Basic
SQL
Query SELECT
[DISTINCT]
target-‐list
FROM
relation-‐list
WHERE
qualification
relation-‐list :
A
list
of
relation
names
possibly
with
a
range-‐variable after
each
name
target-‐list :
A
list
of
attributes
of
tables
in
relation-‐list
qualification :
Comparisons
combined
using
AND,
OR
and
NOT.
Comparisons
are
Attr op const or
Attr1
op Attr2,
where
op is
one
of <, >, =, £, ³, ¹
DISTINCT:
optional
keyword
indicating
that
the
answer
should
not
contain
duplicates.
In
SQL
SELECT,
the
default
is
that
duplicates
are
not
eliminated!
(Result
is
called
a
“multiset”)
Query
Semantics
Conceptually,
a
SQL
query
can
be
computed:
1.
FROM
:
compute
cross-‐product of
tables
(e.g.,
Students
and
Enrolled).
2.
WHERE
:
Check
conditions,
discard
tuples
that
fail.
(called
“selection”).
3.
SELECT :
Delete
unwanted
fields.
(called
“projection”).
4.
If
DISTINCT
specified,
eliminate
duplicate
rows.
Probably
the
least
efficient
way
to
compute
a
query!
Query
Optimization
helps
us
find
more
efficient
strategies
to
get
the
same
answer.
Remember
the
query
and
the
data
82
Reserves sid bid day
Now
the
Details… 22 101 10/10/96
95 103 11/12/96
We
will
use
these
Sailors sid sname rating age
instances
of
relations
in
our
22 Dustin 7 45.0
examples. 31 Lubber 8 55.5
95 Bob 3 63.5
bid bname color
Boats
101 Interlake blue
102 Interlake red
103 Clipper green
104 Marine red
Example
Schemas
CREATE TABLE Sailors (sid INTEGER,
sname CHAR(20),rating INTEGER,age REAL,
PRIMARY KEY sid)
Note
that
target
list
can
be
replaced
by
“*”
if
you
don’t
want
to
do
a
projection:
SELECT *
FROM Sailors x
WHERE x.age > 20
Find
sailors
who
have
reserved
at
least
one
boat
SELECT S.sid
FROM Sailors S, Reserves R
WHERE S.sid=R.sid
Would
adding
DISTINCT
to
this
query
make
a
difference?
What
is
the
effect
of
replacing
S.sid by
S.sname in
the
SELECT
clause?
Would
adding
DISTINCT
to
this
variant
of
the
query
make
a
difference?
Expressions
Can
use
arithmetic
expressions
in
SELECT
clause
(plus
other
operations
we’ll
discuss
later)
Use AS to
provide
column
names
SELECT S.age, S.age-5 AS age1, 2*S.age AS age2
FROM Sailors S
WHERE S.sname = ‘dustin’
Can
also
have
expressions
in
WHERE
clause:
SELECT S1.sname AS name1, S2.sname AS name2
FROM Sailors S1, Sailors S2
WHERE 2*S1.rating = S2.rating - 1
String
operations
SQL also supports some string operations
“LIKE” is used for string matching.
SELECT R1.sid
SELECT
FROM BoatsR.sid
B1, Reserves R1,
FROM Boats Boats
B,Reserves R
B2, Reserves R2
WHERE
WHERE R.bid=B.bid AND
R1.sid=R2.sid
(B.color=‘red’
AND AND B.color=‘green’)
R1.bid=B1.bid
AND R2.bid=B2.bid
AND (B1.color=‘red’ AND B2.color=‘green’)
AND
Continued…
Key
field!
INTERSECT:discussed in
the
SELECT S.sid
book. Can
be
used
to
FROM Sailors S, Boats B,
compute
the
intersection
of
Reserves R
any
two
union-‐compatible WHERE S.sid=R.sid
sets
of
tuples.
AND R.bid=B.bid
AND B.color=‘red’
INTERSECT
Also
in
text:
EXCEPT
(sometimes
called
MINUS)
SELECT S.sid
FROM Sailors S, Boats B,
Included
in
the
SQL/92
Reserves R
standard,
but
some
systems
WHERE S.sid=R.sid
don’t
support
them.
AND R.bid=B.bid
AND B.color=‘green’
Your
turn
…
1. Find
(the
names
of)
all
sailors
who
are
over
50
years
old
2. Find
(the
names
of)
all
boats
that
have
been
reserved
at
least
once
3. Find
all
sailors
who
have
not reserved
a
red
boat
(hint:
use
“EXCEPT”)
4. Find
all
pairs
of
same-‐color
boats
5. Find
all
pairs
of
sailors
in
which
the
older
sailor
has
a
lower rating
Answers
…
1. Find
(the
names
of)
all
sailors
who
are
over
50
years
old
SELECT S.sname
FROM Sailors S
WHERE S.age > 50
Answers
…
2. Find
(the
names
of)
all
boats
that
have
been
reserved
at
least
once
DEMO
106
Getting
PostgreSQL
Part
of
several
linux distributions
[https://www.postgresql.org/download/linux/]
107
TPC-‐H:
a
decision
support
benchmark
a
suite
of
business
oriented
ad-‐hoc
queries
queries
and
data:
broad
industry-‐wide
relevance
while
maintain
ease
of
implementation
relevant
systems:
examine
large
volumes
of
data
execute
highly
complex
queries
answer
critical
business
questions
108
1.2 Database Entities, Relationships, and Characteristics
TPC-‐H
schema
The components of the TPC-H database are defined to consist of eight separate and individual tables (the Base
Tables). The relationships between columns of these tables are illustrated in Figure 2: The TPC-H Schema.
Legend:
The parentheses following each table name contain the prefix of the column names for that table;
The arrows point in the direction of the one-to-many relationships between tables; 109
The number/formula below each table name represents the cardinality (number of rows) of the table. Some
TPCH
Q6
select
sum(l_extendedprice*l_discount)
as
revenue
from lineitem
where l_shipdate >=
date
'[DATE]’
and
l_shipdate <
date
'[DATE]'
+
interval
'1'
year
and
l_discount between
[DISCOUNT]
-‐ 0.01
and
[DISCOUNT]
+
0.01
and
l_quantity <
[QUANT];
parameters:
DATE
DISCOUNT
QUANT
110
Getting
TPCH
Transaction
Performance
Council
(TPC)
[http://www.tpc.org/tpch/]
111