0% found this document useful (0 votes)
23 views48 pages

Lec5 Relational Algebra

This document provides an overview of relational algebra. It defines common relational algebra operators like selection, projection, union, difference and cross product. It explains how relational algebra is used to represent query execution plans and introduces relational calculus as the foundation for SQL. The document also discusses set versus bag semantics and provides examples of composing selection and projection operators.

Uploaded by

Previzsla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views48 pages

Lec5 Relational Algebra

This document provides an overview of relational algebra. It defines common relational algebra operators like selection, projection, union, difference and cross product. It explains how relational algebra is used to represent query execution plans and introduces relational calculus as the foundation for SQL. The document also discusses set versus bag semantics and provides examples of composing selection and projection operators.

Uploaded by

Previzsla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

CMPT 354:

Database System I
Lecture 5. Relational Algebra

1
What have we learned
• Lec 1. Database History
• Lec 2. Relational Model
• Lec 3-4. SQL

2
Why Relational Algebra matter?
• An essential topic to understand how query
processing and optimization work
• What happened when an SQL is issued to a database?

• Help you master the kills to quickly learn a new


query language
• How to quickly learn XML QL and MangoDB QL?

3
Relational Query Languages
• Query languages allow the manipulation and retrieval
of data from a database

• Traditionally: QL != programming language


• Doesn’t need to be Turing complete
• Not designed for computation
• Supports easy, efficient access to large databases

• Recent Years:
• Everything interesting involves a large data set
• QLs are quite powerful for expressing algorithms at scale
4
Formal Query Languages
• Relational Algebra
• Procedural query language
• used to represent execution plans

• Relational Calculus
• Non-procedural (declarative) query language
• Describe what you want, rather than how to compute it
• Foundation for SQL

5
Results of a Query
• Query is a function over relations

Q(R1,...,Rn) = Rresult

• The schema of the result relation is determined by the


input relation and the query
• Because the result of a query is a relation, it can be used
as input to another query

Q( )= ,Q( )= ,…
6
Sets v.s. Bags
• Sets: {a, b, c}, {a, d, e, f}, {}, …
• Bags: {a, a, b, c}, {b, b, b, b, b}, …

• Relational Algebra has two flavors:


• Set semantics = standard Relational Algebra
• Bag semantics = extended Relational Algebra

• DB systems implement bag semantics (Why?)

7
Sets v.s. Bags
• Sets: {a, b, c}, {a, d, e, f}, {}, …
• Bags: {a, a, b, c}, {b, b, b, b, b}, …

• Relational Algebra has two flavors:


• Set semantics = standard Relational Algebra
• Bag semantics = extended Relational Algebra

• DB systems implement bag semantics (Why?)

8
Relational Algebra Operators
• Core 5 operators
• Selection (s)
• Projection (p)
• Union (∪)
• Set Difference (-)
• Cross product (X)

• Additional operators
• Rename (ρ)
• join (⨝)
• Intersect (∩)

9
Selection
• The selection operator, s (sigma), specifies the
rows to be retained from the input relation
• A selection has the form: scondition(relation), where
condition is a Boolean expression
• Terms in the condition are comparisons between two
fields (or a field and a constant)
• Using one of the comparison operators: <, £, =, ¹, ³, >
• Terms may be connected by Ù (and), or Ú (or),
• Terms may be negated using ¬ (not)

10
Selection Example
sbirth < 1981(Customer)

sin firstName lastName birth


Customer 333 Cordelia Chase 1980
sin firstName lastName birth 444 Rupert Giles 1955
111 Buffy Summers 1981
222 Xander Harris 1981
333 Cordelia Chase 1980
444 Rupert Giles 1955
sin firstName lastName birth
555 Dawn Summers 1984
111 Buffy Summers 1981
555 Dawn Summers 1984
slastName = "Summers"(Customer)

11
Projection
• The projection operator, p (pi), specifies the
columns to be retained from the input relation
• A selection has the form: p columns(relation)
• Where columns is a comma separated list of column
names
• The list contains the names of the columns to be
retained in the result relation

12
Projection Example
pfirstName,lastName(Customer)
firstName lastName
Customer Buffy Summers
sin firstName lastName birth Xander Harris
111 Buffy Summers 1981 Cordelia Chase
222 Xander Harris 1981 Rupert Giles
333 Cordelia Chase 1980 Dawn Summers
444 Rupert Giles 1955
birth
555 Dawn Summers 1984
1981
1980
pbirth(Customer) 1955
1984
13
Selection and Projection Notes
• Selection and projection eliminate duplicates
• Since relations are sets
• Both operations require one input relation
• The schema of the result of a selection is the same as
the schema of the input relation
• The schema of the result of a projection contains just
those attributes in the projection list

14
Composing Selection and Projection
psin, firstName(sbirth < 1982 Ù lastName = "Summers"(Customer))

Customer intermediate relation


sin firstName lastName birth sin firstName lastName birth
111 Buffy Summers 1981 111 Buffy Summers 1981
222 Xander Harris 1981
333 Cordelia Chase 1980
444 Rupert Giles 1955
555 Dawn Summers 1984

sin firstName
111 Buffy

15
Composing Selection and Projection
p birth (sbirth < 1981(Customer))
birth
1980
Customer 1955
sin firstName lastName birth
111 Buffy Summers 1981
222 Xander Harris 1981
333 Cordelia Chase 1980
444 Rupert Giles 1955
555 Dawn Summers 1984
birth
1980
sbirth < 1981 (p birth (Customer)) 1955 16
Commutative property
• For example:
• x+y=y+x
• x*y=y*x

• Does it hold for projection and selection?


p columns(scondition(R)) = p condition (scolumns(R)) ?

• What about
p firstName(sbirth < 1981 (Customer))?
17
Commutative property
p firstName (sbirth < 1981(Customer))
firstName
Cordelia
Customer Rupert
sin firstName lastName birth
111 Buffy Summers 1981
222 Xander Harris 1981
333 Cordelia Chase 1980
444 Rupert Giles 1955
555 Dawn Summers 1984 firstName
Cordelia
Rupert
ssbirth
p firstName
birth< (s
<1981
1981(p(p
birth firstName
<firstName,
1981 (Customer))
(p firstName,
birth (Customer))
birth (Customer))) 18
Set Operations Review
A = {1, 3, 6} B = {1, 2, 5, 6}

Union (È) AÈBºBÈA A È B = {1, 2, 3, 5, 6}

Intersection(Ç) AÇBºBÇA A Ç B = {1, 6}

Set Difference(-) A-B¹B-A A - B = {3} B - A = {2, 5}

19
Union Compatible Relations
A op B = Rresult

• where op = È, Ç, or -
• A and B must be union compatible
• Same number of fields
• Field i in each schema have the same type

20
Union Compatible Relations
Intersection of the Employee and Customer relations

Customer Employee
sin firstName lastName birth sin firstName lastName salary
111 Buffy Summers 1981 208 Clark Kent 80000.55
222 Xander Harris 1981 111 Buffy Summers 22000.78
333 Cordelia Chase 1980 412 Carol Danvers 64000.00
444 Rupert Giles 1955 The two relations are not union compatible as
555 Dawn Summers 1984 birth is a DATE and salary is a REAL

We can carry out preliminary operations to make the relations union compatible

psin, firstName, lastName(Customer) Ç psin, firstName, lastName(Employee)

21
Union Compatible Relations
A op B = Rresult

• where op = È, Ç, or -
• A and B must be union compatible
• Same number of fields
• Field I in each schema have the same type

• Result schema borrowed from A

A(age int) È B(num int) = ?Rresult (age int)


22
Union
A
sin firstName lastName sin firstName lastName
111 Buffy Summers 111 Buffy Summers
222 Xander Harris 222 Xander Harris
333 Cordelia Chase 333 Cordelia Chase
444 Rupert Giles 444 Rupert Giles
555 Dawn Summers 555 Dawn Summers
AÈB 208 Clark Kent
B
sin firstName lastName 412 Carol Danvers

208 Clark Kent


111 Buffy Summers
412 Carol Danvers
23
Set Difference
A
sin firstName lastName A-B sin firstName lastName
111 Buffy Summers 222 Xander Harris
222 Xander Harris 333 Cordelia Chase
333 Cordelia Chase 444 Rupert Giles
444 Rupert Giles 555 Dawn Summers
555 Dawn Summers

B
sin firstName lastName B-A sin firstName lastName
208 Clark Kent 208 Clark Kent
111 Buffy Summers 412 Carol Danvers
412 Carol Danvers
24
Note on Set Difference

• Notice that most operators are monotonic


• Increasing size of inputs à outputs grow

• Set Difference is non-monotonic


• Example: A – B
• Increasing the size of B could decrease output size

• Set difference is blocking:


• For A – B, must wait for all B tuples before any results

25
Intersection
A
sin firstName lastName
111 Buffy Summers
222 Xander Harris
333 Cordelia Chase
444 Rupert Giles
555 Dawn Summers sin firstName lastName

B AÇB 111 Buffy Summers


sin firstName lastName
208 Clark Kent
111 Buffy Summers
412 Carol Danvers
26
Note on Intersect

• A ∩ B = Rresult

A B

• Can we express using other operators?


• A∩B=?

27
Note on Intersect

• A ∩ B = Rresult

A B

• Can we express using other operators?


• A ∩ B = A - ?(A– B)

A–B= A B

28
Cartesian Product
A(a1, …, an) x B(an+1 , …,am) = Rresult(a1 , …,am)

• Each row of A paired with each row of B


• Result schema concats A and B’s fields
• Names are inherited if possible (i.e. if not duplicated)
• If two field names are the same (i.e., a naming conflict occurs) and
the affected columns are referred to by position
• If R contains m records, and S contains n records, the result
relation will contain m * n records

29
Cartesian Product Example
slastName = "Summers"(Customer) Account
sin firstName lastName birth acc type balance sin
111 Buffy Summers 1981 01 CHQ 2101.76 111
555 Dawn Summers 1984 02 SAV 11300.03 333
03 CHQ 20621.00 444
slastName = "Summers"(Customer) ´ Account
1 firstName lastName birth acc type balance 8
111 Buffy Summers 1981 01 CHQ 2101.76 111
111 Buffy Summers 1981 02 SAV 11300.03 333
111 Buffy Summers 1981 03 CHQ 20621.00 444
555 Dawn Summers 1984 01 CHQ 2101.76 111
555 Dawn Summers 1984 02 SAV 11300.03 333
555 Dawn Summers 1984 03 CHQ 20621.00 444
30
Renaming
• It is sometimes useful to assign names to the
results of a relational algebra query
• The rename operator, r (rho)
• rS(R) renames a relation
• rS(a1,a2,…,an)(R) renames a relation and its attributes
• rnew/old(R) renames specified attributes
R

rsid1/1, sid2/8(R)

31
Largest Balance
• Find the account with the largest balance; return
accNumber
1. Find accounts which are less than some other
account
saccount.balance < d.balance (Account × rd (Account))

2. Use set difference to find the account with the


largest balance
paccNumber (Account) –
paccount.accNumber(saccount.balance < d.balance (Account × rd (Account)))

32
Relational Algebra Operators
• Core 5 operations
• Selection (s)
• Projection (p)
• Union (∪)
• Set Difference (-)
• Cross product (X)

• Additional operations
• Rename (ρ)
• Intersect (∩)
• Join (⨝ )

33
Relational Algebra Exercises
• Student (sID, lastName, firstName, cgpa)
• 101, Jordan, Michael, 3.8
• Offering (oID, dept, cNum, term, instructor)
• abc, CMPT, 354, Fall 2018, Jiannan
• Took (sID, oID, grade)
• 101, abc, 95

1. sID of all students who have earned some grade over 80 and some grade below 50.

psID (sgrade > 80 (Took)) ∩ psID(sgrade < 50 (Took))


34
Relational Algebra Exercises
• Student (sID, lastName, firstName, cgpa)
• 101, Jordan, Michael, 3.8
• Offering (oID, dept, cNum, term, instructor)
• abc, CMPT, 354, Fall 2018, Jiannan
• Took (sID, oID, grade)
• 101, abc, 95

2. Student number of all students who have taken CMPT 354

psID (sOffering.oID = Took.oID Ù dept = ‘CMPT’ Ù cNum = 354 (Offering x Took))


35
(Inner) Joins
• Motivation
• Simplify some queries that require a Cartesian product

• Natural Join: R ⨝ S = πA (σθ(R × S))

• Theta Join: R ⨝θ S = σθ (R × S)

• Equijoin: R ⨝θ S = σθ (R × S)
• Join condition θ consists only of equalities

36
Natural Join
• There is often a natural way to join two relations
• Join based on common attributes
• Eliminate duplicate common attributes from the result
Customer Employee
sin firstName lastName birth sin firstName lastName salary
111 Buffy Summers 1981 208 Clark Kent 80000.55
222 Xander Harris 1981 111 Buffy Summers 22000.78
333 Cordelia Chase 1980 396 Dawn Allen 41000.21
444 Rupert Giles 1955

Customer ⋈ Employee
sin firstName lastName birth salary
37
111 Buffy Summers 1981 22000.78
Natural Join
R⋈S
• Meaning: R ⋈ S = πA(σθ (R × S))
• Where:
• Selection σθ checks equality of all common attributes
(i.e., attributes with same names)
• Projection πA eliminates duplicate common attributes
• The natural join of two tables with no fields in
common is the Cartesian product
• Not the empty set
38
Natural Join Example
R S
A B C D A B C E
111 Buffy Summers 1981 208 Clark Kent 80000.55
222 Xander Harris 1981 111 Buffy Summers 22000.78
333 Cordelia Chase 1980 396 Dawn Allen 41000.21
444 Rupert Giles 1955

R ⋈ S = πA,B,C,D,E(σR.A=S.A Ù R.B=S.B Ù R.C=S.C (R × S))


A B C D E
111 Buffy Summers 1981 22000.78

39
Theta Join
R ⋈ θ S = σ θ (R x S)

• Most general form


• θ can be any condition
• No projection in this case!
• Result schema same as cross product

40
Theta Join Example
Customer Employee
sin firstName lastName birth sin firstName lastName salary
111 Buffy Summers 1981 208 Clark Kent 80000.55
222 Xander Harris 1981 111 Buffy Summers 22000.78
333 Cordelia Chase 1980 412 Carol Danvers 64000.00
444 Rupert Giles 1955
555 Dawn Summers 1984

Customer ⋈Customer.sin < Employee.sin Employee


1 2 3 birth 5 6 7 salary
111 Buffy Summers 1981 208 Clark Kent 80000.55
111 Buffy Summers 1981 412 Carol Danvers 64000.00
222 Xander Harris 1981 412 Carol Danvers 64000.00
333 Cordelia Chase 1980 412 Carol Danvers 64000.00
41
Equi-Joins
R ⋈ θ S = σ θ (R x S)
• A theta join where θ is an equality predicate
Customer Employee
sin firstName lastName birth sin firstName lastName salary
111 Buffy Summers 1981 208 Clark Kent 80000.55
222 Xander Harris 1981 111 Buffy Summers 22000.78
333 Cordelia Chase 1980 396 Dawn Allen 41000.21
444 Rupert Giles 1955

Customer ⋈Customer.sin = Employee.sin Employee


1 2 3 birth 5 6 7 salary
42
111 Buffy Summers 1981 111 Buffy Summers 22000.78
(Inner) Joins Summary
• Natural Join: R ⨝ S = πA (σθ(R × S))
• Equality on all fields with same name in R and in S
• Projection πA drops all redundant attributes

• Theta Join: R ⨝θ S = σθ (R × S)
• Join of R and S with a join condition θ
• Cross-product followed by selection θ
• No projection

• Equijoin: R ⨝θ S = σθ (R × S)
• Join condition θ consists only of equalities
43
• No projection
Relational Algebra Exercises
• Student (sID, lastName, firstName, cgpa)
• 101, Jordan, Michael, 3.8
• Course (dept, cNum, name, breadth)
• CMPT, 354, DB, True
• Offering (oID, dept, cNum, term, instructor)
• abc, CMPT, 354, Fall 2018, Jiannan
• Took (sID, oID, grade)
• 101, abc, 95

The names of all students who have passed a breadth course (grade >= 60 and breadth = True)
with Martin

plastName, firstName (sbreadth = True Ù grade > 60 Ù instructor = ‘Martin’ (Student ⨝ Took ⨝ Offering ⨝ Course))
44
Different Plans, Same Results
• Semantic equivalence: results are always the same

πname(σcNum=354 (R ⨝ S))

πname(σcNum=354 (R) ⨝ S)

• Are they equivalent?


• Which one is more efficient?
• Can you make it even more efficient?
45
Other Operators
• There are additional relational algebra operators
• Usually used in the context of query optimization
• Duplicate elimination – d
• Used to turn a bag into a set
• Aggregation operators
• e.g. sum, average
• Grouping – g
• Used to partition tuples into groups
• Typically used with aggregation

46
Summary
• Relational Algebra (RA) operators
• Five core operators: selection, projection, cross-product,
union and set difference
• Additional operators are defined in terms of the core
operators: rename, intersection, join
• Theorem: SQL and RA can express exactly the same
class of queries
• Multiple RA queries can be equivalent
• Same semantics but difference performance
• Form basis for optimizations

RDBMS translate SQL à RA, then optimize RA


47
Acknowledge
• Some lecture slides were copied from or inspired by the
following course materials
• “W4111: Introduction to databases” by Eugene Wu at
Columbia University
• “CSE344: Introduction to Data Management” by Dan Suciu at
University of Washington
• “CMPT354: Database System I” by John Edgar at Simon Fraser
University
• “CS186: Introduction to Database Systems” by Joe Hellerstein
at UC Berkeley
• “CS145: Introduction to Databases” by Peter Bailis at Stanford
• “CS 348: Introduction to Database Management” by Grant
Weddell at University of Waterloo
48

You might also like