2019 02 05 Lecture3 sql2 PDF
2019 02 05 Lecture3 sql2 PDF
February 5, 2019
Data Science CSCI 1951A
Brown University
Instructor: Ellie Pavlick
HTAs: Wennie Zhang, Maulik Dang, Gurnaaz Kaur
1
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
2
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
3
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
4
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
5
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
6
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
7
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
8
Follow up from last time
• Do Foreign Keys need to reference Primary Keys?
• NO!
• Also, NULLs are all considered distinct (i.e. NULL != NULL), so we’d
want to have the FK reference a attribute that is not NULL too
• i.e. saying FK = NULL will not allow you to reference the other table
9
Follow up from last time
Why do foreign keys have to be unique?
one second
Elan Shmo D Shmoao
ago
10
Follow up from last time
Why do foreign keys have to be unique?
two seconds
Shmo
ago
Jane Shmane F
three
Shmane
Joao Shmoao B seconds ago
one second
Elan Shmo D Shmoao
ago
11
Follow up from last time
Why do foreign keys have to be unique?
two seconds
Shmo
ago
Jane Shmane F
three
Shmane
Joao Shmoao B seconds ago
one second
Elan Shmo D Shmoao
ago
12
Follow up from last time
Why do foreign keys have to be unique?
two seconds
Shmo
ago
Jane Shmane F
three
Shmane
Joao Shmoao B seconds ago
one second
Elan Shmo D Shmoao
ago
13
Follow up from last time
Why do foreign keys have to be unique?
two seconds
Shmo
ago
Jane Shmane F
three
Shmane
Joao Shmoao B seconds ago
one second
Elan Shmo D Shmoao
ago
14
Follow up from last time
Why do foreign keys have to be unique?
Students Donations
First Last Last Name:
Grade Amount
Name Name FK
15
Follow up from last time
Families
ID Name
1 Shmo
2 Shmane
3 Shmoao
Students Possible Donors
First Last Last Name:
Grade Last-Called
Name Name FK
Joe Shmo A two seconds
Shmo
ago
Jane Shmane F
three
Shmane
Joao Shmoao B seconds ago
one second
Elan Shmo D Shmoao
ago
16
Follow up from last time
• Why would I ever use CHAR(n) as opposed to
VARCHAR(n)? Are there any benefits?
• CHAR(n) is faster
17
Announcements
• Have pen/paper or sit by someone who does—this
will help for working through longer in-class
exercises
18
Outline
• Catchup up from last lecture (more SQL keywords)
• NULLs
19
Outline
• NULLs
20
TWEET
ORDER BY
ID Time Text
782138 2019-01-04 15:04:57 1951A 4 lyfe SELECT Text
389472 2019-01-01 12:34:56 hey FROM Tweet
123794 2019-01-01 12:34:57 lol ORDER BY Time
127890 2019-01-04 17:30:07 hey
893110 2019-01-06 12:21:53 i <3 1951A
596208 2019-01-02 3:14:15 :-D Text
173902 2019-01-05 3:34:18 i <3 1951A hey
lol
:-D
1951A 4 lyfe
hey
i <3 1951A
i <3 1951A
21
TWEET
ORDER BY
ID Time Text
782138 2019-01-04 15:04:57 1951A 4 lyfe SELECT Text
389472 2019-01-01 12:34:56 hey FROM Tweet
123794 2019-01-01 12:34:57 lol ORDER BY ID
127890 2019-01-04 17:30:07 hey
893110 2019-01-06 12:21:53 i <3 1951A
596208 2019-01-02 3:14:15 :-D Text
173902 2019-01-05 3:34:18 i <3 1951A lol
hey
i <3 1951A
hey
:-D
1951A 4 lyfe
i <3 1951A
22
GROUP BY
TWEET
ID Likes Text SELECT Text,
782138 1,000 1951A 4 lyfe Count(*), AVG(Likes)
389472 10 hey FROM Tweet
123794 100 lol GROUP BY Text
127890 0 hey
893110 8,000,000 i <3 1951A
Text Count(*) AVG(Likes)
596208 1 :-D
lol 1 100
173902 1,000,000,000 i <3 1951A
hey 2 5
i <3 1951A 2 504,000,000
:-D 1 1
1951A 4 lyfe 1 1,000
23
GROUP BY
TWEET
ID Likes Text SELECT Text,
782138 1,000 1951A 4 lyfe Count(*), AVG(Likes)
389472 10 hey FROM Tweet
123794 100 lol GROUP BY Text
127890 0 hey
893110 8,000,000 i <3 1951A
Text Count(*) AVG(Likes)
596208 1 :-D
lol 1 100
173902 1,000,000,000 i <3 1951A
hey 2 5
i <3 1951A 2 504,000,000
SUM, MIN, MAX, :-D 1 1
COUNT, AVG 1951A 4 lyfe 1 1,000
24
HAVING
TWEET
ID Likes Text
782138 1,000 1951A 4 lyfe SELECT Text,
389472 10 hey Count(*), AVG(Likes)
123794 100 lol FROM Tweet
127890 0 hey GROUP BY Text
893110 8,000,000 i <3 1951A HAVING COUNT(*) > 1
596208 1 :-D
173902 1,000,000,000 i <3 1951A Text Count(*) AVG(Likes)
hey 2 5
SUM, MIN, MAX,
i <3 1951A 2 504,000,000
COUNT, AVG
25
LIKE
TWEET
ID Likes Text
SELECT Text, Count(*),
AVG(Likes)
782138 1,000 1951A 4 lyfe
FROM Tweet
389472 10 hey
WHERE Text LIKE ‘%1951A%’
123794 100 lol
GROUP BY Text
127890 0 hey
893110 8,000,000 i <3 1951A
596208 1 :-D
Text Count(*) AVG(Likes)
173902 1,000,000,000 i <3 1951A
1951A 4 lyfe 1 1,000
26
STUDENT
IN
ID Name
1 Wennie
2 Maulik SELECT Name
3 Gurnaaz FROM STUDENT
4 Jens WHERE ID IN
5 Erin (SELECT Student
FROM GRADES
GRADES
WHERE Course = 1951A
)
Student Course Grade
1 32 A
2 1951A A Find names of
6 32 A students in 1951A
27
“Subquery”
STUDENT
IN (More later, get
ID Name excited)
1 Wennie
2 Maulik SELECT Name
3 Gurnaaz FROM STUDENT
4 Jens WHERE ID IN
5 Erin (SELECT Student
FROM GRADES
GRADES
WHERE Course = 1951A
)
Student Course Grade
1 32 A
2 1951A A Find names of
6 32 A students in 1951A
28
IN Returns “bag”
STUDENT
of student IDs
ID Name
1 Wennie
2 Maulik SELECT Name
3 Gurnaaz FROM STUDENT
4 Jens WHERE ID IN
5 Erin (SELECT Student
FROM GRADES
GRADES
WHERE Course = 1951A
)
Student Course Grade
1 32 A
2 1951A A Find names of
6 32 A students in 1951A
29
IN Returns True if
STUDENT
ID is in that bag
ID Name
1 Wennie
2 Maulik SELECT Name
3 Gurnaaz FROM STUDENT
4 Jens WHERE ID IN
5 Erin (SELECT Student
FROM GRADES
GRADES
WHERE Course = 1951A
)
Student Course Grade
1 32 A
2 1951A A Find names of
6 32 A students in 1951A
30
STUDENT
ALL/ANY
ID Name
1 Wennie SELECT Grade
2 Maulik FROM GRADES
3 Gurnaaz
WHERE Course = “1951A”
4 Jens
AND Grade >= ALL
5 Erin
(SELECT Grade
FROM GRADES
GRADES WHERE Course = 1951A
Student Course Grade )
1 1951A 3.5
2 1951A 3.5 What is the highest
6 1951A 2.8 grade in 1951A?
31
STUDENT
ALL/ANYReturns True if condition holds
ID Name for all tuples in bag
1 Wennie SELECT Grade
2 Maulik FROM GRADES
3 Gurnaaz
WHERE Course = “1951A”
4 Jens
AND Grade >= ALL
5 Erin
(SELECT Grade
FROM GRADES
GRADES WHERE Course = 1951A
Student Course Grade )
1 1951A 3.5
2 1951A 3.5 What is the highest
6 1951A 2.8 grade in 1951A?
32
STUDENT
ALL/ANY
ID Name
1 Wennie SELECT Grade
2 Maulik FROM GRADES
3 Gurnaaz
WHERE Course = “1951A”
4 Jens
AND Grade > ANY
5 Erin
(SELECT Grade
FROM GRADES
GRADES WHERE Course = 1951A
Student Course Grade )
1 1951A 3.5
2 1951A 3.5
???
6 1951A 2.8
33
STUDENT
ALL/ANY
ID Name
1 Wennie SELECT Grade
2 Maulik FROM GRADES
3 Gurnaaz
WHERE Course = “1951A”
4 Jens
AND Grade > ANY
5 Erin
(SELECT Grade
FROM GRADES
GRADES WHERE Course = 1951A
Student Course Grade )
1 1951A 3.5
2 1951A 3.5 Return all grades
6 1951A 2.8 except the lowest one.
34
STUDENT
ALL/ANY
ID Name
1 Wennie SELECT Grade
2 Maulik FROM GRADES
3 Gurnaaz
WHERE Course = “1951A”
4 Jens
AND Grade > NOT ANY
5 Erin
(SELECT Grade
FROM GRADES
GRADES WHERE Course = 1951A
Student Course Grade )
1 1951A 3.5
2 1951A 3.5 Return the lowest grade.
6 1951A 2.8
35
STUDENT
ALL/ANY
ID Name
SELECT Grade
1 Wennie FROM GRADES
2 Maulik WHERE Course = “1951A”
3 Gurnaaz AND Grade >= ALL
4 Jens (SELECT Grade
5 Erin FROM GRADES
WHERE Course = 1951A
GRADES
)
Student Course Grade
1 1951A 3.5
2 1951A 3.5 Grade
6 1951A 2.8
…
36
STUDENT
ALL/ANY
ID Name
SELECT Grade
1 Wennie FROM GRADES
2 Maulik WHERE Course = “1951A”
3 Gurnaaz AND Grade >= ALL
4 Jens (SELECT Grade
5 Erin FROM GRADES
WHERE Course = 1951A
GRADES
)
Student Course Grade
1 1951A 3.5
2 1951A 3.5 Grade
6 1951A 2.8 3.5
3.5
37
STUDENT
DISTINCT
ID Name
SELECT DISTINCT Grade
1 Wennie FROM GRADES
2 Maulik WHERE Course = “1951A”
3 Gurnaaz AND Grade >= ALL
4 Jens (SELECT Grade
5 Erin FROM GRADES
WHERE Course = 1951A
GRADES
)
Student Course Grade
1 1951A 3.5
2 1951A 3.5 Grade
6 1951A 2.8 3.5
38
STUDENT
DISTINCT
ID Name
SELECT DISTINCT Grade
1 Wennie FROM GRADES
2 Maulik WHERE Course = “1951A”
3 Gurnaaz AND Grade >= ALL
4 Jens (SELECT Grade
5 Erin FROM GRADES
WHERE Course = 1951A
GRADES
)
Student Course Grade
1 1951A 3.5
2 1951A 3.5 Grade
6 1951A 2.8 3.5
40
EXISTS
STUDENT True as long as bag is not empty
ID Name
1 Wennie SELECT NAME
2 Maulik FROM STUDENT s
3 Gurnaaz
WHERE NOT EXISTS
4 Jens
(SELECT *
5 Erin
FROM GRADES
WHERE Course = 1951A
GRADES AND Student = s.ID
Student Course Grade )
1 1951A 3.5
2 1951A 3.5
???
6 1951A 2.8
41
STUDENT
EXISTS
ID Name
1 Wennie SELECT NAME
2 Maulik FROM STUDENT s
3 Gurnaaz
WHERE NOT EXISTS
4 Jens
(SELECT *
5 Erin
FROM GRADES
WHERE Course = 1951A
GRADES AND Student = s.ID
Student Course Grade )
1 1951A 3.5
2 1951A 3.5 Students who are
6 1951A 2.8 not in 1951A
42
Outline
• Catchup up from last lecture (more SQL keywords)
• NULLs
43
NULL!
• Black hole! NULL is NULL is NULL and there is no coming back from it…
• NULL + 1 = NULL
• NULL * 0 = NULL
44
NULL!
p q p OR q p AND q p=q
45
NULL!
p q p OR q p AND q p=q
TRUE TRUE TRUE TRUE TRUE
TRUE FALSE TRUE FALSE FALSE
FALSE TRUE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE
TRUE UNK TRUE UNK UNK
FALSE UNK UNK FALSE UNK
UNK TRUE TRUE UNK UNK
UNK FALSE UNK FALSE UNK
UNK UNK UNK UNK UNK
46
NULL!
WHERE: Only tuples which evaluate to true are
part of the query result. (I.e. unknown and
false treated equivalently.)
TWEET
ID Text Likes
SELECT COUNT(*)
389472 NULL 100 FROM TWEET
123794 NULL 3 WHERE Likes != 10
596208 :-D NULL
782138 1951A 4 lyfe NULL
173902 i <3 1951A 19 Count(*)
893110 i <3 1951A 7539
4
47
NULL!
GROUP BY: If NULL exists, then there is a group for NULL.
49
NULL!
For predicates with NULL, use IS (e.g. not “=“)
51
Clicker Question!
SELECT COUNT(*)
What will be the result of FROM TWEET
this query? WHERE Text != “:)”
1 Wennie 1 Wennie 2
2 Maulik 2 Maulik 3
3 Gurnaaz 3 Gurnaaz 2
SELECT COUNT(*)
FROM RUNNERS
WHERE ID NOT IN SELECT(Winner_ID FROM RACES)
(a) (b) (c)
Count(*) Count(*) Count(*)
0 1 2
54
RUNNERS
Clicker Question! RACES
ID Name Event_ID Event Winner_ID
1 Wennie 1 Wennie 2
2 Maulik 2 Maulik 3
3 Gurnaaz 3 Gurnaaz 2
0 1 2
55
Outline
• Catchup up from last lecture (more SQL keywords)
• NULLs
56
Relational Algebra Recap
• σ<condition>(S): select, return a relation containing just the tuples in
S that meet condition
• S × S’: cross product, return a new relation S’’ such that, for every
t in S and t’ in S’, (t, t’) is in S’’.
57
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Text
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A
Relational Algebra
ID Text
389472 hey ???
123794 lol
596208 :-D
782138 1951A 4 lyfe
173902 i <3 1951A
893110 i <3 1951A
58
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Text
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A
Relational Algebra
ID Text
389472 hey
123794 lol π<ID,Text>(TWEET)
596208 :-D
782138 1951A 4 lyfe
173902 i <3 1951A
893110 i <3 1951A
59
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Text
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Text
389472 hey ???
60
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Text
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Text
389472 hey π<ID,Text>(σText=“hey”(TWEET))
61
Clicker Question!
62
Clicker Question!
63
Clicker Question!
64
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
ρR(S
PERSON RETWEET
Handle Name Person Tweet SQL
m Maulik m 1 SELECT Name
w Wennie m 2 FROM PERSON, RETWEET
g Gurnaaz w 1 WHERE PERSON.Handle =
RETWEET.Person
Relational Algebra
???
65
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
ρR(S
PERSON RETWEET
Handle Name Person Tweet SQL
m Maulik m 1 SELECT Name
w Wennie m 2 FROM PERSON, RETWEET
g Gurnaaz w 1 WHERE PERSON.Handle =
RETWEET.Person
Relational Algebra
π<Name>(σPERSON.Handle = RETWEET.Person(
PERSON × RETWEET)
)
66
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
ρR(S
PERSON RETWEET
Handle Name Person Tweet SQL
m Maulik m 1 SELECT Name
w Wennie m 2 FROM PERSON AS p,
g Gurnaaz w 1 RETWEET AS r
WHERE r.Person = p.Handle
Relational Algebra
???
67
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
ρR(S
PERSON RETWEET
Handle Name Person Tweet SQL
m Maulik m 1 SELECT Name
w Wennie m 2 FROM PERSON AS p,
g Gurnaaz w 1 RETWEET AS r
WHERE r.Person = p.Handle
Relational Algebra
πName(σp.Handle = r.Person(
ρp(PERSON) × ρr(RETWEET)
)
68
Execution Order
πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
× …
R1 R2
69
Execution Order
πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
× …
R1 R2
70
Execution Order
πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
× …
R1 R2
71
Execution Order
πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
× …
R1 R2
72
SQL -> Relational Algebra
TWEET
ID Time Text SQL
389472 12:34:56 hey
123794 12:34:57 lol SELECT ID, Text
596208 3:14:15 :-D FROM TWEET
782138 15:04:57 1951A 4 lyfe
WHERE Text = “hey”
173902 3:34:18 i <3 1951A
893110 12:21:53 i <3 1951A
Relational Algebra
ID Text
πID,Text
389472 hey
σText=“hey”
TWEET
73
SQL -> Relational Algebra
TWEET
ID Time Text SQL
389472 12:34:56 hey
123794 12:34:57 lol SELECT ID, Text
596208 3:14:15 :-D FROM TWEET
782138 15:04:57 1951A 4 lyfe
WHERE Text = “hey”
173902 3:34:18 i <3 1951A
893110 12:21:53 i <3 1951A
Relational Algebra
ID Text
σText=“hey”
389472 hey
Which is better?
(a) σ<condition>(π<attr_list>(R))
(b) π<attr_list>(σ<condition>(R))
75
Clicker Question!
Which is better?
(a) σ<condition>(π<attr_list>(R))
(b) π<attr_list>(σ<condition>(R))
76
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Text
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
σText=“hey”(π<ID,Text>(TWEET))
π<ID,Text>(σText=“hey”(TWEET))
77
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
σText=“hey”(π<ID,Time>(TWEET))
π<ID,Time>(σText=“hey”(TWEET))
78
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
σText=“hey”(π<ID,Time>(TWEET))
π<ID,Time>(σText=“hey”(TWEET))
79
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol
σText=“hey”(π<ID,Time>(TWEET))
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe π<ID,Time>(σText=“hey”(TWEET))
173902 3:34:18 i <3 1951A
893110 12:21:53 i <3 1951A
80
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Time
389472 12:34:56
123794 12:34:57 σText=“hey”(π<ID,Time>(TWEET))
596208 3:14:15
782138 15:04:57 π<ID,Time>(σText=“hey”(TWEET))
173902 3:34:18
893110 12:21:53
81
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Time
389472 12:34:56
123794 12:34:57 σText=“hey”(π<ID,Time>(TWEET))
596208 3:14:15
782138 15:04:57 π<ID,Time>(σText=“hey”(TWEET))
173902 3:34:18
893110 12:21:53
82
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol
σText=“hey”(π<ID,Time>(TWEET))
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe π<ID,Time>(σText=“hey”(TWEET))
173902 3:34:18 i <3 1951A
893110 12:21:53 i <3 1951A
83
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Time Text
389472 12:34:56 hey σText=“hey”(π<ID,Time>(TWEET))
π<ID,Time>(σText=“hey”(TWEET))
84
> ( S ) : s e lect
on
σ<conditi s t > ( S ):
e _ l i
π<attribut
ion
SQL -> Relational Algebra ∪( S , S ’
S × S’: cr
)
)
:
: u
re
n
o
n
s
a
s
m
p
e
r o d u ct
TWEET ρR(S
ID Time Text
389472 12:34:56 hey
123794 12:34:57 lol SQL
596208 3:14:15 :-D
782138 15:04:57 1951A 4 lyfe SELECT ID, Time
173902 3:34:18 i <3 1951A FROM TWEET
893110 12:21:53 i <3 1951A WHERE Text = “hey”
Relational Algebra
ID Time
389472 12:34:56 σText=“hey”(π<ID,Time>(TWEET))
π<ID,Time>(σText=“hey”(TWEET))
85
Execution Order
πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
“Canonical Execution Order” × …
(FROM WHERE SELECT) R1 R2
86
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
× …
R1 R2
87
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
(a) O(mk)
× …
(b) O(m x k) R1 R2
(c) O(m + k)
(d) O(mk-n) 88
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
(a) O(mk)
× …
(b) O(m x k) R1 R2
(c) O(m + k)
(d) O(mk-n) 89
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
(a) O(mk) mxm
× …
(b) O(m x k) R1 R2
(c) O(m + k)
(d) O(mk-n) 90
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P (m x m) x m ×
× Rk
(a) O(mk) mxm
× …
(b) O(m x k) R1 R2
(c) O(m + k)
(d) O(mk-n) 91
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP ((k-1) x m) x m
FROM R1…Rk
WHERE P (m x m) x m ×
× Rk
(a) O(mk) mxm
× …
(b) O(m x k) R1 R2
(c) O(m + k)
(d) O(mk-n) 92
Clicker Question!
How much memory do I need?
say each R has
O(m) tuples πA1…Ak
SELECT A1…An σP ((k-1) x m) x m
FROM R1…Rk
WHERE P (m x m) x m ×
× Rk
(a) O(mk) mxm
× …
(b) O(m x k) R1 R2
(c) O(m + k)
m = 1000, k = 3 —> 1 billion tuples
(d) O(mk-n) 93
Execution Order
πA1…Ak
SELECT A1…An σP
FROM R1…Rk
WHERE P ×
× Rk
× …
R1 R2
πTWEET.Time
σ(A.TWEET = T.ID)⋀(T.Date=“1/1/19”)⋀(A.Person =“BarakckObama”)
×
TWEET AUTHOR
πTWEET.Time
σ(A.TWEET = T.ID)⋀(T.Date=“1/1/19”)⋀(A.Person =“BarakckObama”)
×
6,000 /second =
TWEET AUTHOR
500M/day =
Billions and billions
http://www.internetlivestats.com/twitter-statistics/
96 https://www.omnicoreagency.com/twitter-statistics/
Execution Order
SELECT TWEET.Time
FROM TWEET, AUTHOR
WHERE AUTHOR.TWEET = TWEET.ID
and TWEET.Date == ’01/01/2019‘
and AUTHOR.Person = “BarackObama”
πTWEET.Time
σ(A.TWEET = T.ID)⋀(T.Date=“1/1/19”)⋀(A.Person =“BarakckObama”)
×
6,000 /second =
TWEET AUTHOR
500M/day =
100s of millions
Billions and billions
http://www.internetlivestats.com/twitter-statistics/
97 https://www.omnicoreagency.com/twitter-statistics/
Execution Order
SELECT TWEET.Time
FROM TWEET, AUTHOR
WHERE AUTHOR.TWEET = TWEET.ID
and TWEET.Date == ’01/01/2019‘
and AUTHOR.Person = “BarackObama”
πTWEET.Time
σ(A.TWEET = T.ID)⋀(T.Date=“1/1/19”)⋀(A.Person =“BarakckObama”)
98
Execution Order
SELECT TWEET.Time
FROM TWEET, AUTHOR
WHERE AUTHOR.TWEET = TWEET.ID
and TWEET.Date == ’01/01/2019‘
and AUTHOR.Person = “BarackObama”
πTWEET.Time
σ(A.TWEET = T.ID)⋀(T.Date=“1/1/19”)⋀(A.Person =“BarakckObama”)
O(kind of tiny) ×
TWEET AUTHOR
99
Execution Order
SELECT TWEET.Time
FROM TWEET, AUTHOR
WHERE AUTHOR.TWEET = TWEET.ID
and TWEET.Date == ’01/01/2019‘
and AUTHOR.Person = “BarackObama”
πTWEET.Time
σ(A.TWEET = T.ID)⋀(T.Date=“1/1/19”)⋀(A.Person =“BarakckObama”)
×
TWEET AUTHOR
Thoughts??
100
Execution Order
SELECT TWEET.Time
FROM TWEET, AUTHOR
WHERE AUTHOR.TWEET = TWEET.ID
and TWEET.Date == ’01/01/2019‘
and AUTHOR.Person = “BarackObama”
πTWEET.Time
σ(A.TWEET = T.ID)⋀(A.Person =“BarakckObama”)
×
AUTHOR
σDate=“1/1/19”
TWEET
101
Execution Order
SELECT TWEET.Time
FROM TWEET, AUTHOR
WHERE AUTHOR.TWEET = TWEET.ID
and TWEET.Date == ’01/01/2019‘
and AUTHOR.Person = “BarackObama”
πTWEET.Time
σA.TWEET = T.ID
×
σDate=“1/1/19” σPerson =“BarakckObama”
TWEET AUTHOR
102
Clicker Question! (Demand?)
Find grades of
students taking
Optimize this. 1951A ahead of
STUDENT schedule
ID Name Year
SELECT Grade
1 Wennie 4
FROM STUDENT, GRADES
2 Maulik 5
WHERE STUDENT.ID = GRADES.Student
3 Gurnaa 5 and GRADES.Course == ’1951A‘
4 z
Jens 4 and STUDENT.Year < GRADES.Tgt_Yr
5 Erin 4
GRADES
πGrade
Student Course Grade Tgt_Yr
1 32 A 1
σ(ID = Student)⋀(Course = 1951A)
2 1951A A 3 ⋀(Year < Tgt_Yr)
6 32 A 1
×
STUDENT GRADES
103
Clicker Question!
πGrade
(a) πGrade
(c)
σID = Student σID = Student ⋀ σYear < Tgt_Yr
× ×
σYear < Tgt_Yr σCourse = 1951A σCourse = 1951A
STUDENT
(b) πGrade
×
σID = Student σCourse = 1951A
× ×
σYear < Tgt_Yr σCourse = 1951A σCourse = 1951A
STUDENT
(b) πGrade
×
σID = Student σCourse = 1951A
× ×
σYear < Tgt_Yr σCourse = 1951A σCourse = 1951A
STUDENT
(b) πGrade
Depends on
σYear
output of
< Tgt_Yr
× join
σID = Student σCourse = 1951A
• NULLs
107
Nested Queries
STUDENT
ID Name Year GRADES
1 Wennie 4 Studen Cours GPA Tgt_Yr
2 Maulik 5 1 32 4.0 1
3 Gurnaa 5 2 1951A 3.5 3
4 z
Jens 4 6 32 2.8 1
5 Erin 4
SELECT s.Name
FROM STUDENT s
WHERE NOT EXISTS(
SELECT *
FROM GRADES
WHERE s.ID = STUDENT.ID
)
112
Clicker Question!
STUDENT
ID Name Year How many courses is each student taking?
1 Wennie 4
2 Maulik 5 SELECT s.ID, s.Name,
3 Gurnaa 5 (SELECT COUNT(*) as num_courses
4 z
Jens 4 FROM GRADES g
5 Erin 4 WHERE s.ID = g.Student)
FROM STUDENT s
GRADES
Studen Cours GPA Tgt_Yr
1 32 4.0 1
2 1951A 3.5 3
6 32 2.8 1
STUDENT
ID Name Year How many courses is each student taking?
1 Wennie 4
2 Maulik 5 SELECT s.ID, s.Name,
3 Gurnaa 5 (SELECT COUNT(*) as num_courses
4 z
Jens 4 FROM GRADES g
5 Erin 4 WHERE s.ID = g.Student)
FROM STUDENT s
GRADES
Studen Cours GPA Tgt_Yr Yes! This value will be
1 32 4.0 1
2 1951A 3.5 3 different for every row
6 32 2.8 1 (i.e. for every s.ID)
Is this query correlated?
(a) uh huh (b) nuh uh
114
Nested Queries
STUDENT
ID Name Year How many courses is each student taking?
1 Wennie 4
2 Maulik 5 SELECT s.ID, s.Name, c.num_courses
3 Gurnaa 5 FROM STUDENT s,
4 z
Jens 4 (SELECT Student,
5 Erin 4 COUNT(*) AS num_courses
FROM GRADES
GRADES GROUP BY Student) c
Studen Cours GPA Tgt_Yr WHERE s.ID = c.Student
1 32 4.0 1
2 1951A 3.5 3
6 32 2.8 1
115
Clicker Question!
STUDENT
ID Name Year How many courses is each student taking?
1 Wennie 4
2 Maulik 5 SELECT s.ID, s.Name, c.num_courses
3 Gurnaa 5 FROM STUDENT s,
4 z
Jens 4 (SELECT Student,
5 Erin 4 COUNT(*) AS num_courses
FROM GRADES
GRADES GROUP BY Student) c
Studen Cours GPA Tgt_Yr WHERE s.ID = c.Student
1 32 4.0 1
2 1951A 3.5 3
6 32 2.8 1
STUDENT
ID Name Year How many courses is each student taking?
1 Wennie 4
2 Maulik 5 SELECT s.ID, s.Name, c.num_courses
3 Gurnaa 5 FROM STUDENT s,
4 z
Jens 4 (SELECT Student,
5 Erin 4 COUNT(*) AS num_courses
FROM GRADES
GRADES GROUP BY Student) c
Studen Cours GPA Tgt_Yr WHERE s.ID = c.Student
1 32 4.0 1
2 1951A 3.5 3
6 32 2.8 1
118
Rewriting Queries
How many courses is each student taking?
119
Rewriting Queries
How many courses is each student taking?
120
(non)Clicker Question!
Rewrite to remove the subquery altogether?
STUDENT
ID Name Year GRADES
1 Wennie 4 Studen Cours GPA Tgt_Yr
2 Maulik 5 1 32 4.0 1
3 Gurnaa 5 2 1951A 3.5 3
4 z
Jens 4 6 32 2.8 1
5 Erin 4
SELECT s.Name
FROM STUDENT s
WHERE EXISTS(
SELECT * FROM GRADES
WHERE s.ID = GRADES.Student
AND s.Year < GRADES.Tgt_Yr
)
SELECT s.Name
FROM STUDENT s, GRADES g
WHERE s.ID = g.Student
AND s.Year < g.Tgt_Yr