Lecture4 Algebra
Lecture4 Algebra
http://en.wikipedia.org/wiki/File:Edgar_F_Codd.jpg
Drinkers Frequent Bars
“X” times a week ▪ Bars Serve Beers At price “Y”
Bars Serve Beers
At price “Y” ▪ Beers - Each has a brewer
Drinkers Beers
Each has an address Each has a brewer
Bar Beer
name address Name brewer
The Edge 108 Morris Street Budweiser Anheuser-Busch Inc.
Satisfaction 905 W. Main Street Corona Grupo Modelo
Dixie Dixie Brewing
Drinker Frequents
“BEERS” AS A name
Amy
address
100 W. Main Street
drinker
Ben
bar
Satisfaction
times_a_week
2
RELATIONAL Ben
Dan
101 W. Main Street
300 N. Duke Street
Dan
Dan
The Edge
Satisfaction
1
2
DATABASE
Likes Serves
Simplicity is a virtue!
Ordering of rows doesn’t matter (even though output is always in some order)
Corona Grupo Modelo Dan The Edge 1 The Edge Corona 3.00
• Instance
• Represents the data content
• Changes rapidly, but always conforms to the schema
RelOp
RelOp
• Core operators:
• Selection, projection, cross product, union, difference,
and renaming
• Additional, derived operators:
• Join, natural join, intersection, etc.
• Compose operators to make complex queries
Input: a table 𝑅
𝑝 is called a
Notation: 𝜎P𝑅 selection condition
(or predicate)
𝐿 is a list of
Notation: 𝜋L 𝑅 columns in 𝑅
Serves
𝝅𝒃𝒆𝒆𝒓,𝒑𝒓𝒊𝒄𝒆 Serves
bar beer price
The Edge Budweiser 2.50 beer price
Budweiser 2.50
The Edge Corona 3.00
Corona 3.00
Satisfaction Budweiser 2.25
Budweiser 2.25
Output of 𝜋beerServes?
Duplicate output rows are removed (by definition)
Example: beer on servers
𝝅𝒃𝒆𝒆𝒓 Serves
Serves
bar beer price beer
The Edge Budweiser 2.50 Budweiser
Natation: 𝑅×𝑆
Bar x Frequents
name address drinker bar times_a_
w eek
The Edge 108 Morris Street Ben Satisfaction 2
The Edge 108 Morris Street Dan The Edge 1 Dan The Edge 1 The Edge 108 Morris Street
The Edge 108 Morris Street Dan Satisfaction 2 = Dan Satisfaction 2 The Edge 108 Morris Street
Satisfaction 905 W. Main Street Ben Satisfaction 2 Ben Satisfaction 2 Satisfaction 905 W. Main Street
Satisfaction 905 W. Main Street Dan The Edge 1 Dan The Edge 1 Satisfaction 905 W. Main Street
Satisfaction 905 W. Main Street Dan Satisfaction 2 Dan Satisfaction 2 Satisfaction 905 W. Main Street
So cross product is commutative, i.e., for any R and S, R X S = S X R (up to the ordering
of columns)
(Also known as “theta-join”: most general joins)
• Input: two tables 𝑅 and 𝑆
• Notation: 𝑅 ⋈P 𝑆 One of the most important operations!
• 𝑝 is called a join condition (or predicate)
• Purpose: relate rows from two tables according to some criteria
• Output: for each row 𝑟 in 𝑅 and each row 𝑠 in 𝑆, output a row 𝑟𝑠 if 𝑟 and 𝑠
satisfy 𝑝
• Shorthand for 𝜎P(R×𝑆)
• Predicate p only has equality (A = 5 ∧ B = 7) : equijoin
Extend Frequents relation with addresses of the bars
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 ⋈bar = name 𝐵𝑎𝑟
Ambiguous attribute? Prefix a column reference with table name and “.” to disambiguate identically named columns
from different tables. Ex. Use Bar.name
Bar Frequents
name address drinker bar times_a_week
The Edge 108 Morris Street Ben Satisfaction 2
Satisfaction 905 W. Main Street Dan The Edge 1
Dan Satisfaction 2
Serves Likes
drinker beer
bar beer price
The Edge Budweiser 2.50 Amy Corona
The Edge Corona 3.00 Dan Budweiser
Satisfaction Budweiser 2.25 Dan Corona
Ben Budweiser
Serves ⋈ 𝐿𝑖𝑘𝑒𝑠
bar beer price drinker
The Edge Budweiser 2.50 Dan Natural Join is on beer.
The Edge Budweiser 2.50 Ben
Only one column for beer in the output
The Edge Corona 3.00 Amy
What happens if the tables
The Edge Corona 3.00 Dan
have two or more common columns?
... …. …..
Input: two tables 𝑅 and 𝑆
Notation: 𝑅 ∪ 𝑆
Output:
Example on board
Input: two tables 𝑅 and 𝑆
Notation: 𝑅 ∩ 𝑆
• 𝑅 and 𝑆 must have identical schema
Output:
• Has the same schema as 𝑅 and 𝑆
• Contains all rows that are in both 𝑅 and 𝑆
How can you write it using other operators?
Shorthand for 𝑅 − (𝑅 − 𝑆)
Also equivalent to 𝑆 − (𝑆 − 𝑅)
And to 𝑅 ⋈ 𝑆
Bar
name address
Also called logical Plan tree
The Edge 108 Morris Street 𝜋address
Satisfaction 905 W. Main Street
⋈bar=name
Frequents
drinker bar times_a_week 𝜎drinker = ‘Dan’
Ben Satisfaction 2
Equivalent to
𝜋 address(𝐵𝑎𝑟 ⋈bar = name (𝜎drinker =‘Dan’𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠))
• Selection: 𝜎P 𝑅
• Projection: 𝜋L 𝑅
• Cross product: 𝑅×𝑆
• Union: 𝑅 ∪ 𝑆
• Difference: 𝑅 − 𝑆
• Renaming: 𝜌S A1, A2 ,… 𝑅
• Does not really add “processing” power
SUMMARY OF DERIVED
OPERATORS
▪ • Join: 𝑅 ⋈P 𝑆
▪ Natural join: 𝑅 ⋈ 𝑆
▪ Intersection: 𝑅 ∩ 𝑆
▪ Many more
▪ Semijoin, anti-semijoin, quotient, …
Frequents(drinker, bar, times_of_week)
Exercise Bar(name, address)
Drinker(name, address)
𝜌bar 𝜋bar
⋈ drinker = 𝑛𝑎𝑚𝑒
𝜋name
𝜎address=‘300 N.Duke Street
𝐵𝑎𝑟 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠
𝐷𝑟𝑖𝑛𝑘𝑒𝑟
41
Frequents(drinker, bar, times_of_week)
Bar(name, address)
A trickier Exercise Drinker(name, address)
For each bar, find the drinkers who frequent it max no. times a
week
42
Frequents(drinker, bar, times_of_week)
Bar(name, address)
A trickier Exercise Drinker(name, address)
For each bar, find the drinkers who frequent it max no. times a week
• Who do NOT visit a bar max no. of times?
• Whose times_of_weeks is lower than somebody else’s for a given bar
Can also define Tempboats using union. Try the “AND” version yourself
Not supported as a primitive operator, but useful for expressing queries like:
Find sailors who have reserved all boats.
Let A have 2 fields, x and y; B have only field y:
..... / ( Boats)
bid bname =' Interlake'
Add more rows RelOp What happens
to the input...
to the output?
• Selection: 𝜎P 𝑅 Monotone
• Projection: 𝜋L 𝑅 Monotone
• Cross product: 𝑅×𝑆 Monotone
▪ Join: 𝑅 ⋈P 𝑆 Monotone
• Natural join: 𝑅 ⋈ 𝑆 Monotone
▪ Union: 𝑅 ∪ 𝑆 Monotone
• Difference: 𝑅 − 𝑆 Monotone w.r.t. 𝑅; non-monotone w.r.t 𝑆
• Intersection: 𝑅 ∩ 𝑆 Monotone
45