Week2 Lecture
Week2 Lecture
Introduction to Databases
1
Databases
Logical Design (The Relational Algebra)
Chapter 4
2
Review
3
Review
Queries Answers
Query Optimization
and Execution
Relational Operators
DBMS
Files and Access Methods
Buffer Management
DB
4
Today
Queries Answers
Query Compilation
Buffer Management
DB
5
Recall our query example
• Given the following instances of Employees and Works_In
• Return all department ids and parking lots for employees with name “Mohamed”
Parsing
𝜋𝐸⸳𝑙𝑜𝑡⸴𝑊⸳𝑑𝑖𝑑 On-the-fly
𝜋𝐸⸳𝑙𝑜𝑡⸴𝑊⸳𝑑𝑖𝑑
Project
Iterator
σ𝐸⸳𝑛𝑎𝑚𝑒 = "𝑀𝑜ℎ𝑎𝑚𝑒𝑑" Indexed
⨝ 𝐸⸳𝑐𝑖𝑛 = 𝑊⸳𝑐𝑖𝑛 Nested Loop
Join Iterator
⨝ 𝐸⸳𝑐𝑖𝑛 = 𝑊⸳𝑐𝑖𝑛 σ𝐸⸳𝑛𝑎𝑚𝑒 = "𝑀𝑜ℎ𝑎𝑚𝑒𝑑" Works_in
B+tree Heap Scan
Index Scan Iterator
Employees Works_in Iterator Employees
Review
SQL Query Relational Algebra
SELECT E.lot, W.did 𝜋𝐸⸳𝑙𝑜𝑡⸴𝑊⸳𝑑𝑖𝑑 ሺσ𝐸⸳𝑛𝑎𝑚𝑒
FROM Employees E, Works_In W Query Compiler = "𝑀𝑜ℎ𝑎𝑚𝑒𝑑" ሺ
𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠 ⨝ 𝐸⸳𝑐𝑖𝑛
WHERE E.cin=W.cin = 𝑊⸳𝑐𝑖𝑛 𝑊𝑜𝑟𝑘𝑠_𝐼𝑛ሻሻ
AND E.name=“Mohamed” Relational Algebra == Logical Query Plan
Logical Query Plan Physical Query Plan
Parsing
𝜋𝐸⸳𝑙𝑜𝑡⸴𝑊⸳𝑑𝑖𝑑 On-the-fly
𝜋𝐸⸳𝑙𝑜𝑡⸴𝑊⸳𝑑𝑖𝑑
Project
Iterator
σ𝐸⸳𝑛𝑎𝑚𝑒 = "𝑀𝑜ℎ𝑎𝑚𝑒𝑑" Indexed
Optimization ⨝ 𝐸⸳𝑐𝑖𝑛 = 𝑊⸳𝑐𝑖𝑛 Nested Loop
Join Iterator
⨝ 𝐸⸳𝑐𝑖𝑛 = 𝑊⸳𝑐𝑖𝑛 σ𝐸⸳𝑛𝑎𝑚𝑒 = "𝑀𝑜ℎ𝑎𝑚𝑒𝑑" Works_in
B+tree Heap Scan
Index Scan Iterator
Employees Works_in Iterator Employees
Review
SQL Query Relational Algebra
SELECT E.lot, W.did 𝜋𝐸⸳𝑙𝑜𝑡⸴𝑊⸳𝑑𝑖𝑑 ሺσ𝐸⸳𝑛𝑎𝑚𝑒
FROM Employees E, Works_In W Query Compiler = "𝑀𝑜ℎ𝑎𝑚𝑒𝑑" ሺ
𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒𝑠 ⨝ 𝐸⸳𝑐𝑖𝑛
WHERE E.cin=W.cin = 𝑊⸳𝑐𝑖𝑛 𝑊𝑜𝑟𝑘𝑠_𝐼𝑛ሻሻ
AND E.name=“Mohamed” Relational Algebra == Logical Query Plan
14
Formal Relational Query Languages
15
Why Learn Relational Algebra
16
Relational Algebra
17
Preliminaries
18
Example Instances R1 sid bid day
22 101 10/10/96
• “Sailors” and “Reserves” 58 103 11/12/96
relations for our
examples. S1 sid sname rating age
• We’ll use positional or 22 dustin 7 45.0
named field notation,
assume that names of
31 lubber 8 55.5
fields in query results are 58 rusty 10 35.0
`inherited’ from names of
fields in query input S2
sid sname rating age
relations. 28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
19
Relational Algebra
• Pure relational algebra operates on sets
• No duplicate tuples
• In contrast to SQL, which operates on bags (multisets)
• Basic operations:
•
Selection ( ) Selects a subset of rows from relation.
• Projection ( ) Deletes unwanted columns from relation.
• Cross-product ( ) Allows us to combine two relations.
•
•
−
Set-difference ( ) Tuples in relation 1 but not in relation 2.
Union ( ) Tuples in relation1 and in relation 2.
• Additional operations:
• Intersection, join, division, renaming: Not essential, but (very!) useful.
• Since each operation returns a relation, operations can be composed!
(Algebra is “closed”.)
Relational Algebra
• Additional operations that are not essential, but (very!) useful
• Intersection ( ∩ ) A union B= AintersB + (A-B) + (B-A) A inters B=A-(A-B)
• Join (⨝ or ⨝θ )
• Division (/)
• Renaming (ᵨ)
22
Relational Algebra vs SQL
23
Projection sid sname rating age
S2
28 yuppy 9 35.0
• Corresponds to a SELECT in 31 lubber 8 55.5
SQL
44 guppy 5 35.0
• Deletes attributes that are not 58 rusty 10 35.0
in projection list.
• Schema of result contains
exactly the fields in the sname rating
projection list, with the same yuppy 9
names that they had in the
(only) input relation. lubber 8
guppy 5
rusty 10
sname,rating(S2)
24
Projection sid sname rating age
S2
28 yuppy 9 35.0
• Projection operator has to 31 lubber 8 55.5
eliminate duplicates! (Why??)
44 guppy 5 35.0
• Note: real systems typically
58 rusty 10 35.0
don’t do duplicate
elimination unless the user
explicitly asks for it. (Why sname rating
not?)
yuppy 9
why sql do not eleminate duplicates: lubber 8 age
we don't need duplicates but eliminating them is
really consuming guppy 5 35.0
OR we need duplicates like when we want to
know how many mohamed is in the class etc rusty 10 55.5
sname,rating(S2) age(S2)
25
Selection
• Corresponds to the WHERE rating 8(S2)
clause in SQL
• Selects rows that satisfy selection sid sname rating age
condition.
28 yuppy 9 35.0
• Schema of result identical to
schema of (only) input relation. 58 rusty 10 35.0
• No duplicates in result! (Why?)
sname,rating( rating 8(S2))
S2
sid sname rating age
28 yuppy 9 35.0 sname rating
• Result relation can
31 lubber 8 be the input for
55.5
another relational algebra yuppy 9
44 guppy 5 35.0
operation! (Operator composition.) rusty 10
58 rusty 10 35.0
26
Selection
• Result relation can be the input for rating 8(S2)
another relational algebra
operation! (Operator composition.)
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
SS11SS22sid
sid sname
sname rating
rating age
age
S1− S2 sid sname rating age
22 dustin 77 45.0 22 dustin 7 45.0
22 dustin 45.0
31
31 lubber
lubber 88 55.5
55.5
58
58 rusty
rusty 10
10 35.0
35.0 S1 S2 sid sname rating age
44
44 guppy
guppy 55 35.0
35.0 31 lubber 8 55.5
28
28 yuppy
yuppy 99 35.0
35.0 58 rusty 10 35.0 30
Union, Intersection, Set-Difference
SS11SS22sid
sid sname
sname rating
rating age
age
SS11−−SS22 sid sname rating age
22 dustin 77 45.0 22 dustin 7 45.0
22 dustin 45.0
31
31 lubber
lubber 88 55.5
55.5
58
58 rusty
rusty 10
10 35.0
35.0 S1S
1S2S2 sid sname rating age
44
44 guppy
guppy 55 35.0
35.0 31 lubber 8 55.5
28
28 yuppy
yuppy 99 35.0
35.0 58 rusty 10 35.0 31
Union, Intersection, Set-Difference
SS11SS22sid
sid sname
sname rating
rating age
age
SS11−−SS22 sid
sid sname
sname rating
rating age
age
22 dustin 77 45.0 22
22 dustin
dustin 77 45.0
45.0
22 dustin 45.0
31
31 lubber
lubber 88 55.5
55.5
58
58 rusty
rusty 10
10 35.0
35.0 S1S
1S2S2 sid
sid sname
sname rating
rating age
age
44
44 guppy
guppy 55 35.0
35.0 31
31 lubber
lubber 88 55.5
55.5
28
28 yuppy
yuppy 99 35.0
35.0 58
58 rusty
rusty 1010 35.0
35.0 32
Cross-Product
• Condition Join: R c S = c ( R S)
35
36
Joins
S1 R1
S1. sid R1. sid 37
Joins
(sid)
sid1 sname rating age (sid)
sid2 bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
S1 R1
sid
40
Joins
• Equi-Join: A special case of condition join where the condition c
contains only equalities.
• Result schema similar to cross-product, with output having
duplicate column names so need the renaming operator
S1 sid sname rating age R1 sid bid day
22 dustin 7 45.0 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 Need to apply the renaming operator!
sid1 sid2
S1 R1
sid
41
Joins removes duplicates columns
43
Examples of Division A/B
46
Find names of sailors who’ve reserved boat #103
• Can identify all red or green boats, then find sailors who’ve
reserved one of these boats:
(Tempboats, ( Boats))
color =' red ' color =' green '
sname(Tempboats Re serves Sailors)
54