Chapter 4 - B - Relational - Algebra II
Chapter 4 - B - Relational - Algebra II
Relational
Algebra II
COMP3278 Introduction to
Database Management Systems
2
Section 1
Additional
operators
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Motivation
The fundamental operators of the relational algebra
introduced in Chapter 4A are sufficient to express
any relational-algebra query.
However, if we restrict ourselves to just the
fundamental operators, certain common queries are
lengthy to express.
Therefore, we define additional operators that do not add any
power to the algebra, but simplify common queries
4
Additional operators
Set intersection ( )
Natural join ( )
Assignment ()
Left outer join ( ), Right outer join ( )
Division ()
Many more
Θ-join, semi join, anti join, full outer join.
5 5
R S
Set intersection
R S = R – (R – S)
R and S must have the same number of R-S R – (R - S )
attributes and attribute domains compatible.
Example
Query: Find the employee_id of employees who work
in department 1 and department 3.
Query
SELECT employee_id processer
FROM Works_in
WHERE department_id=1
( πemployee_id ( σdepartment_id=1 ( Works_in ) ) ) ( πemployee_id ( σdepartment_id=3 ( Works_in ) ) )
INTERSECT Relational algebra with set intersection
SELECT employee_id
FROM Works_in ( πemployee_id ( σdepartment_id=1 ( Works_in ) ) ) –
WHERE department_id=3 ( ( πemployee_id ( σdepartment_id=1 ( Works_in ) ) ) – ( πemployee_id ( σdepartment_id=3 ( Works_in ) ) ) )
SQL Equivalence relational algebra with only fundamental operators 6
Set intersection
For your reference,
there is another way to
answer the same query
by joining the
Works_in table with
itself.
11
Natural join
The schema of R S is R-schema S-schema
(repeated attributes are removed)
For each pair of tuples tr from R and ts from S,
If tr and ts share the same value over each of the common
attributes in R and S,
12
Natural join
R S
A B A C Common attributes: R S = {A}
1 1 1 2
1 2 2 1 Attributes of the resulting relation: R S = {A,B,C}
2 3
R S
A B C
1 1 2
equivalent to: 1 2 2
2 3 1 13
Assignment
It is convenient to write a relational-algebra expression
by assigning parts of it to temporary relation variables.
16
Left outer join
R S = ( R ⋈ S ) ( R - R( R ⋈ S )) × { (null, … , null ) }
Customer Depositor Customer ⋈ Depositor
customer_id name address account_id customer_id customer_id name address account_id
C1 Kit CB320 A1 C1 C1 Kit CB320 A1
C2 Ben CB326 A1 C2 C2 Ben CB326 A1
C3 Jolly CB311 A2 C2 C2 Ben CB326 A2
C4 Yvonne CB415 A3 C4 C4 Yvonne CB415 A3
A4 C4 C4 Yvonne CB415 A4
20
Right outer join
R S = ( R ⋈ S ) ( S - S( R ⋈ S )) × { (null, … , null ) }
Customer Depositor Customer ⋈ Depositor
customer_id name address account_id customer_id customer_id name address account_id
C1 Kit CB320 A1 C1 C1 Kit CB320 A1
C2 Ben CB326 A1 C2 C2 Ben CB326 A1
C3 Jolly CB311 A2 C2 C2 Ben CB326 A2
C4 Yvonne CB415 A3 C4 C4 Yvonne CB415 A3
A4 C5
Division A
1
2
2
B
1
1
2
B
1
2
3 3 RS
Notation: R S 4
4
1
2
A
2
4 3 4
Definition The attributes in relation S is a subset of
Let S R the attributes in relation R.
R S = { t | t R-S(R) ( s S, ( (t s) R) )}
R-S(R)
Condition 1. A resulting A Condition 2. And if we combine t with each
tuple t has to be in the 1
tuple s S, all the combined tuples have to be
relation R-S(R) 2
3 included in R.
4
A B
In this case, both tuples are in R,
t2 R-S(R) A s S, ( t2 s ) 2 1
so t2 is in result of R S
2 2 2 24
R S
Division A
1
2
2
B
1
1
2
B
1
2
3 3 RS
4 1 A
4 2 2
Observation 4 3 4
Let’s focus on the result of R S (say, the tuple A=2), it means that
For the tuples with values in attribute A equals to 2 in relation R,
Those tuples’s values in attribute B covers ALL values in attribute B of S.
R×S (R × S) S
A B C D A B
1
1
1
1
1
2
1
2
1
2
2
2
1
1
2
1
2
3
=R
1 2 2 2
2 3 1 2
2 3 2 2
Self-study question
How about (R S) S = R ?
Is this true in relational algebra?
Can you prove/ disprove it? 29
Section 2
Extended
operators
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Aggregation
Aggregation function takes a collection of values and
returns a single-valued result.
e.g., avg, min, max, sum, count, count-distinct
31
Aggregation
G1, G2, …, Gn
g F1(A1), F2(A2), …, Fn(An)
(E)
E – a relation (can be a result of relational algebra
expression).
G1, …, Gn – attributes used to form groups.
Tuples with the same values in G1 to Gn are put into the
same group.
G can be empty, which means that the whole relation is
one group.
Fi(Ai) – an aggregate function applied on an attribute.
32
Aggregation
Account Step 1. Let’s group the tuples in
branch_id account_id balance Account according to their
B1 A1 500
B2 A2 400 branch_id.
B2 A3 900
B1 A4 700 Step 2. Then aggregate the tuples
in each group by summing their
values in the balance attribute.
Result(branch_id, sum_of_balance) (
Step 3. Since the resulting relation
branch_id g sum(balance) (Account) has no name after aggregation,
) we use renaming operator to give
name to the relation and
attributes.
Result
branch_id sum_of_balance
B1 1200
B2 1300
33
Aggregation
Student
Note that grouping can be student_id name dpt_id gender
1 Peter 1 M
done on multiple attributes. 2 Sharon 1 F
3 David 2 M
4 Joe 2 M
E.g., in this case, we group 5 Betty 1 F
tuples in Student with the
same values in both dpt_id Result(dpt_id, gender, count) (
and gender attributes.
dpt_id, gender g count() (Student)
i.e., We are finding the )
number of male / female
students in each department.
Result
dpt_id gender count
1 M 1
1 F 2
2 M 2
34
Section 3
Algebraic
properties
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Transformation of expression
A query can be expressed in several different ways,
with different cost of evaluation.
36
Equivalence rules
Rule 1. Only the final operations in a sequence of
projection operations are needed.
πL1 (πL2 (…(πLn ( E ))…)) = πL1 (E)
πemployee_id(πemployee_id,salary(Employee))
Expression tree:
employee_id π employee_id (
1
πemployee_id,salary( Employee )
2 Tells which operator is
3 )
Regular expression
executed ahead of another.
πemployee_id,salary(Employee)
employee_id name π employee_id It allow transformation of
1 Jones the execution order by
2 Smith applying the equivalence rules.
3 Smith πemployee_id,salary
Employee
(Alter the tree)
employee_id name salary
1 Jones 26000 Employee
2 Smith 28000
3 Smith 24000 Expression tree 37
Equivalence rules
Rule 1. Only the final operations in a sequence of
projection operations are needed.
πL1 (πL2 (…(πLn ( E ))…)) = πL1 (E)
π employee_id (
πemployee_id,salary( Employee ) equivalent π employee_id ( Employee )
) to Regular expression
πemployee_id(Employee)
Regular expression
employee_id
1 π employee_id
2 Remove it
3
πemployee_id,salary π employee_id
Employee equivalent
employee_id name salary to
1 Jones 26000 Employee Employee
2 Smith 28000
3 Smith 24000 Expression tree Transformed expression tree 38
Equivalence rules
Rule 2. Conjunctive selection operations can be
deconstructed into a sequence of individual selections.
σp1 p2( E ) = σp1 (σp2 ( E ) )
σ name="Smith" ( σ name="Smith" (σ salary>24000 ( Employee ))
σname="Smith" salary>24000( Employee ) σ salary>24000 ( Employee ) employee_id name salary
) 2 Smith 28000
σ name="Smith"
σ name="Smith" salary>24000
equivalent σ salary>24000(Employee)
σ salary>24000
Employee to employee_id name salary
1 Jones 26000
Employee 2 Smith 28000
You may wonder why breaking down the conjunctive selections Employee
is useful. employee_id name salary
We will show that it is useful to reduce temporary result. 1 Jones 26000
We can try to push each selection predicates down the tree 2 Smith 28000
(to perform selection as early as possible). 3 Smith 24000
39
Equivalence rules
Rule 3. Selection operations are commutative.
σp1 (σp2 ( E ) ) = σp2 (σp1 ( E ) )
σ name="Smith" (σ salary>24000 ( Employee ))
σ salary>24000(σ name="Smith" ( Employee ))
employee_id name salary
σ name="Smith" ( σ salary>24000 ( employee_id name salary
2 Smith 28000 2 Smith 28000
σ salary>24000(Employee) σ name="Smith"(Employee)
) )
σ salary>24000(Employee)
σ name="Smith" σ salary>24000 σ name="Smith"(Employee)
employee_id name salary
employee_id name salary
1 Jones 26000 σ salary>24000 σ name="Smith" 2 Smith 28000
2 Smith 28000
3 Smith 24000
4 David 25000
Employee Employee
Employee Employes
employee_id name salary Note that the two executions employee_id name salary
1 Jones 26000 have different costs. In particular, 1 Jones 26000
2 Smith 28000 the size of their temporary 2 Smith 28000
3 Smith 24000 3 Smith 24000
4 David 25000 relations are different. 4 David 25000
Equivalence rules
Rule 4. Natural join operations are associative.
( E1⋈ E2 ) ⋈ E3 = E1⋈ ( E2 ⋈ E3 )
⋈ ⋈
equivalent Employee ⋈
⋈ Department
to
Employee Works_in Works_in Department
Expression tree A. 1
name
Jones 26000 1
name
Toys
2 Smith 28000 1 Toys
2 Smith 28000 2 Tools
(Employee ⋈ Works_in) ⋈ Department 3 Parker 35000 3 Food
4 Smith 24000 3 Food
⋈ Employee ⋈ Works_in
employee_id name salary department_id
1 Jones 26000 1
⋈ Department 2 Smith 28000 1
2 Smith 28000 2
3 Parker 35000 3
4 Smith 24000 3
Employee Works_in
Expression tree B. 1
name
Jones 26000 1
name
Toys
2 Smith 28000 1 Toys
2 Smith 28000 2 Tools
Employee ⋈ ( Works_in ⋈ Department ) 3 Parker 35000 3 Food
4 Smith 24000 3 Food
Works_in ⋈ Department
⋈ employee_id department_id name
1 1 Toys
2 1 Toys
Employee ⋈ 2 2 Tools
3 3 Food
4 3 Food
Works_in Department
Natural join evaluates 5*3 = 15 combinations
result temporary relation consists of 5 tuples and 3 columns.
Employee Works_in
Natural join evaluates 4*5 = 20 combinations
result temporary relation consists of 5 tuples and 4 columns.
Employee Works_in
employee_id name salary employee_id department_id
1 Jones 26000 1 1
2 Smith 28000 2 1
3 Parker 35000 2 2
4 Smith 24000 3 3 45
4 3
Equivalence rules
σ Employee.name="Smith" (Employee) ⋈ Works_in
employee_id name salary department_id
Expression tree B. 2 Smith 28000 1
2 Smith 28000 2
4 Smith 24000 3
σ Employee.name="Smith" (Employee)
σ Employee.name="Smith" Works_in
employee_id name salary
2 Smith 28000
Employee 4 Smith 24000
equivalent
σ Employee.name="Smith" Works_in.department_id=3 ⋈
to
⋈ σ Employee.name="Smith" σ Works_in.department_id=3
47
Equivalence rules
Expression tree A. σ Employee.name="Smith" Works_in.department_id=3(Employee ⋈ Works_in)
employee_id name salary department_id
σ Employee.name="Smith" Works_in.department_id=3( 4 Smith 24000 3
Employee ⋈ Works_in
)
Employee ⋈ Works_in
employee_id name salary department_id
σ Employee.name="Smith" Works_in.department_id=3
1 Jones 26000 1
2 Smith 28000 1
2 Smith 28000 2
⋈ 3 Parker 35000 3
4 Smith 24000 3
Employee Works_in
Employee Works_in
employee_id name salary employee_id department_id
1 Jones 26000 1 1
2 Smith 28000 2 1
3 Parker 35000 2 2
4 Smith 24000 3 3 48
4 3
Equivalence rules
( σ Employee.name="Smith" (Employee) ⋈ σ Works_in.department_id=3(Works_in) )
Expression tree B. employee_id name salary department_id
4 Smith 24000 3
( σ Employee.name="Smith" (Employee)
⋈ Natural join evaluates 4 combinations
σ Works_in.department_id=3 ( Works_in ) )
Employee Works_in
When comparing with the equivalence
expression 1, we can see that if we
Employee Works_in push the selection predicates down the
employee_id name salary employee_id department_id
natural join (perform selection earlier
1 Jones 26000 1 1 than joining), the natural join would
2
3
Smith
Parker
28000
35000
2
2
1
2
consider fewer combinations.
4 Smith 24000 3 3
4 3 49
Equivalence rules
Rule 6. The projection operation can distribute over the
natural join operation.
50
Equivalence rules
πL1 L2 ( E1⋈ E2) = πL1 L2 ( (πL1 L3 (E1)) ⋈ (πL2 L3 (E2)) )
π Empoyee.name, Works_in.department_id (
⋈
π Empoyee.name, Employee.employee_id (Employee)
⋈
πEmpoyee.name, πWorks_in.department_id,
π Works_in.department_id, Works_in.employee_id(Works_in)
Employee.employee_id Works_in.employee_id
)
Employee Works_in
51
Equivalence rules
π Empoyee.name, Works_in.department_id (Employee ⋈ Works_in)
Expression tree A. name
Jones
department_id
1
Smith 1
Smith 2
π Empoyee.name, Works_in.department_id Parker 3
Smith 3
(Employee ⋈ Works_in)
Employee ⋈ Works_in
π Empoyee.name, Works_in.department_id employee_id name salary department_id since
1 Jones 26000 1 2012/1/1
⋈ 2 Smith 28000 1 2011/3/2
2 Smith 28000 2 2014/2/1
3 Parker 35000 3 2013/2/2
4 Smith 24000 3 2013/2/8
Employee Works_in
E 1 E2 = E 2 E1
E1 E2
The set different operation is NOT
commutative
E1 - E2 ≠ E2 - E1
E1 - E2 ≠ E2 – E1
54
Equivalence rules
Rule 8. The set operations union and intersections
are associative.
(E1 E2) E3 = E1 ( E2 E3 )
(E1 E2) E3 = E1 ( E2 E3 )
The set different operation is NOT associative.
E3 E3
(E1 - E2) - E3
(E1 - E2) - E3 ≠ E1 - ( E2 - E3 )
E1 - ( E2 - E3 )
E1 E2 E1 E2
55
Equivalence rules
Rule 9. The selection operation distributes over the
union, intersection and set difference operations
σp ( E1 E2 ) = σp ( E1 ) σp ( E2 )
σp ( E1 E2 ) = σp ( E1 ) σp ( E2 )
σp ( E1 - E2 ) = σp ( E1 ) - σp ( E2 )
Audio_CD σ stock<10 (
ID name provider_id stock #tracks π name, provider_id, stock ( Audio_CD )
CD1 One Heart P1 55 14 π name, provider_id, stock ( DVD )
CD2 Miracle P2 4 14 )
DVD
equivalent to
ID name provider_id stock length
DVD1 Prince of Persia P2 3 110 σ stock<10 (π name, provider_id, stock ( Audio_CD ))
DVD2 Iron man 3 P3 60 90 σ stock<10 (π name, provider_id, stock ( DVD ))
DVD3 Legend is born: Ip Man P3 17 90
56
Equivalence rules
σ stock<10 ( σ stock<10 (π name, provider_id, stock ( Audio_CD ) π name, provider_id, stock ( DVD ) )
π name, provider_id, stock ( Audio_CD ) name provider_id stock
π name, provider_id, stock ( DVD ) Miracle P2 4
Prince of Persia P2 3
)
Audio_CD DVD
πL ( E1 E2 ) = πL ( E1 ) πL ( E2 )
πL ( E 1 E2 ) ≠ πL ( E 1 ) π L ( E2 )
πL ( E 1 - E2 ) ≠ πL ( E1 ) - πL ( E2 )
59
Equivalence rules
Why projection does not distribute over
intersection and set difference?
≠
R πA ( R ) πA ( R S ) ≠ πA ( R ) πA ( S )
A B A
1 1 1 πA ( R ) πA ( S ) Now, can you try to construct a
A
counter example to show that
S πA ( S ) projection does not distribute over
1
A B A set difference?
1 2 60
1
Equivalence rules
Note that the equivalence rules listed are just a
partial list of equivalences.
61
Section 4
Example of
query optimization
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]
Transformation
Find the names of all instructors in the CS department (dpt_id =
1) who have taught a course in 2nd semester, together with the
course title of all the courses that the instructors teach.
AFTER Rule 5b
σp1 p2( E1⋈ E2) = (σp1 ( E1 ) ⋈ σp1 ( E2 )) BEFORE
πI.name,C.title πI.name,C.title
Now we can further
push the selection one
⋈ more level down by ⋈
applying rule 5b.
⋈ πC.course_id,C.title According to Rule 5b,
we can distribute σ I.dpt_id=1 T.sem=2 πC.course_id,C.title
- σ I.dpt_id=1 to I
σ I.dpt_id=1 σ T.sem=2 C - σ T.sem=2 to T ⋈ C
I T
I T
67
Illustration (original tree)
πI.name,C.title ( I( Instructor) ⋈ ( T( Teaches) ⋈ C( Course) )
σ I.dpt_id =1 T.sem=2( instructor_id name dpt_id course_id sem title credit
I( Instructor) ⋈ 1 Kit 1 1 1 Intro to DB 6
( T( Teaches) ⋈ C( Course) ) 1 Kit 1 2 2 Programming I 6
) 2 Ben 1 4 1 Algorithms 6
) 3 Michael 2 3 2 Accounting 6
πI.name,C.title
σ I.dpt_id=1 T.sem=2 σ I.dpt_id =1 T.sem=2( I( Instructor) ⋈ ( T( Teaches) ⋈ C( Course) ))
instructor_id name dpt_id course_id sem title credit
⋈ 1 Kit 1 2 2 Programming I 6
I ⋈
πI.name,C.title ( σ I.dpt_id =1 T.sem=2( I( Instructor) ⋈ ( T( Teaches) ⋈ C( Course) )))
T C
name title
Kit Programming I
69
Illustration (transformed tree)
πI.name,C.title (
( σ I.dpt_id =1 ( I( Instructor) )
⋈
σ T.sem=2 ( T( Teaches) )
)
⋈ π C.course_id, C.title ( C( Course) )
) ⋈ πC.course_id,C.title
σ I.dpt_id =1 ( I( Instructor) ) ⋈ σ T.sem=2 ( T( Teaches) )
σ I.dpt_id=1 σ T.sem=2 C
instructor_id name dpt_id course_id sem
1 Kit 1 2 2
I T
Natural join evaluates 2*2 = 4 combinations.
π C.course_id, C.title ( C( Course) )
course_id title
σ I.dpt_id =1 ( I( Instructor) ) σ T.sem=2 ( T( Teaches) )
1 Intro to DB
instructor_id name dpt_id instructor_id course_id sem 2 Programming I
1 Kit 1 1 2 2 3 Accounting
2 Ben 1 3 3 2 4 Algorithms
πI.name,C.title ( σ I.dpt_id =1 ( I( Instructor) ) ⋈ σ T.sem=2 ( T( Teaches) ) ⋈ π C.course_id, C.title ( C( Course) ) )
name title
Kit Programming I 71
Summary
Relational algebra (RA) defines a set of algebraic
operations on tables, and output tables as result.
6 fundamental operations (in Chapter 4A).
Additional operations does not extend the power of the
fundamental operators, but they simplify the expression.
Extended operations add expressive power.
END
COMP3278 Introduction to
Database Management Systems
Slides prepared by - Dr. Chui Chun Kit, http://www.cs.hku.hk/~ckchui/ for students in COMP3278
For other uses, please email : [email protected]