Outer Join and Aggregate Function
Outer Join and Aggregate Function
Join
Sanghita Bhattacharjee
Department of CSE
NIT Durgapur
References
• A Silberschatz, H F Korth and S Sudarshan, Database System Concepts, 5th Edition, 2006
• Video lectures:
(i) Database Management System by Prof. Partha Pratim Das
(ii) Introduction to database systems by Prof. P. Sreenivasa Kumar
(iii) Online DBMS tutorials
Join
• Binary operator
• Denoted by
• Join is used to combine the related tuples from two relations into single tuples
• Join is useful as it allows to process the relationships among the relations
• Cartesian Product:
• All combinations of tuples are included in the result
• Certain tuples in the result are meaningful
• Useful when we follow select after Cartesian Product
• Join : only the combination of tuples satisfying the join condition appear in the
result
R S = 𝜎𝑐 ( R× S)
Example of Join
Employee HOD
Suppose that, we want to retrieve the name of the HOD of various departments. So, we have to join two tables/
relations to retrieve the required information i.e. Ename.
Here, HODID is proper subset of EID
HODID is FK and EID is PK. So referential integrity is used to maintain the consistency to match the two
join attributes.
Types of Join
Q= R S
= (R.Section = S.Class)
Equivalent SQL: Select SID, Cname, Credit from Course as T1 inner join Enrolment as T2 on T1.CID =
T2.Cno
Equi Join Example
Class Course
SID Sname Section
2 CS01
R 101 Alex 3 S
2 PH01
102 Rohit 2
3 ME01
1 BIO01
Q = (R S)
R.Section = S.Class
• In result of an Equi Join, we can have one or more pairs of attributes that have
identical values in every tuple because of the equality join condition
• See in the previous slide, values of the attributes Section and Class are identical
for every tuple in the resulting relation Q
• Because one of each pair of attributes with identical values is superfluous, Natural
Join is created to over come the problem of superfluous attribute in an Equi Join
• The definition of Natural Join requires that the two join attributes ( or a pair of
join attributes) must have same name in both the relations.
Natural Join
• Natural join does not use any comparison operator. It does not concatenate the
way the Cartesian product does
• Natural join can only be performed if there exists at least one common attribute
between the relations. Those attributes must have same name and domain
• There can be a list of join attributes from each relation, each corresponding pair
must have the same name
• Relations R, S-have common attributes, say X1,X2,X3
• Join condition:
(R.X1= S.X1) ^ (R.X2= S.X2) ^ (R.X3= S.X3)
provided the values of common attributes should be equal
• Schema for the result Q = R ⋃(S-{X1, X2, X3 })
Only one copy of the common attributes is kept
• Notation for natural join Q = R* S
Natural Join Example
A B B C
R X Y S Z U
X Z V W
Y Z Z V
Z V
A B C
X Z U
R*S X Z V
Y Z U
Y Z V
Z V W
More on Natural Join
• If joining attributes of the relations do not have same name, a renaming operation
is performed first. Then join is applied
Q= R * (𝜌A (S))
• If the attributes on which the natural join is performed have the same name in both
the relations , renaming is unnecessary
• In join, if no combination of tuples satisfies the join condition, the result of join is
empty relation with 0 tuples. If R has m tuples and S has n tuples, the size of R * S
will have between 0 to m*n tuples
• If there is no join condition, all tuples qualify and join becomes Cartesian Product
• The natural join or Equi join can also be specified among multiple tables, leading
to n-way join
((R *a S) *b Q)
1. Let two relations R ( A, B, C) and S ( B, D, E)
B→A
A→C
R has 200 tuples and S has 100 tuples. What is the maximum size of the natural join
R * S ? Answer : 100 tuples
Q:
Q. Which of the following query expression are correct? r1, r2 are relations, c1 c2
are conditions and A1, A2 are attributes
(i ) 𝜎𝑐1 (𝜎𝑐2 ( r1)) → 𝜎𝑐2 (𝜎𝑐1 ( r1))
(ii) 𝜎𝑐1 ( r1 ∪ r2) → 𝜎𝑐1 (r1) ∪ 𝜎𝑐1 (r2)
(iii) 𝜎𝑐1 (𝜋𝐴1 ( r1)) → 𝜋𝐴1 (𝜎𝑐1 ( r1))
(iv) 𝜋𝐴1 (𝜎𝑐1 ( r1)) → 𝜎𝑐1 (𝜋𝐴1 ( r1))
Questions
• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R * S ?
Q. Find names and cities of all employees who work for RS company
𝜋𝑒𝑛𝑎𝑚𝑒,𝑐𝑖𝑡𝑦 (𝜎𝑐𝑛𝑎𝑚𝑒=′𝑅𝑆′ ( Works_on) *Employee)
Q. Find name, street, city of all employees who work for RS and earn more than
20000
T1= 𝜎𝑐𝑛𝑎𝑚𝑒=′𝑅𝑆′ ∧ 𝑠𝑎𝑙𝑎𝑟𝑦>20000 ( Works_on)
Result = 𝜋𝑒𝑛𝑎𝑚𝑒,𝑠𝑡𝑟𝑒𝑒𝑡,𝑐𝑖𝑡𝑦 (T1 * Employee)
Queries in Join
Q. Companies are in different cities. Find all companies located in every city in
which RS company is located
T1= 𝜋𝑐𝑖𝑡𝑦 (𝜎𝑐𝑛𝑎𝑚𝑒=𝑅𝑆 (Company))
T2= company ÷ 𝑇1
Result= 𝜋𝑐𝑛𝑎𝑚𝑒 ( T2)
Outer Join and Aggregation
Inner Join
• Inner join is one of the more frequent types of joins
• Finds all rows which meet the join condition
• Theta join, equi-join, natural join are all called inner joins
• The result of these operations contain only the matching tuples
r s
EMP MANAGER
When we perform EMP * MANAGER, then the TID Ename Salary Dept Controlli
resulting relation will give all information about the ng Dept
employees who are managers, not other employees 4 Alia 40000 HR HR
information. So, we loose some tuples after 2 David 30000 IT IT
performing natural join between the relations.
3 Alisha 40000 SALES SALES
Information about Smith, Niya are not in the output 5 Dev 30000 FINANCE FINANCE
relation
Outer Join
• There exists methods by which all the tuples of any relation are included in
resulting relation. They are known as outer join
• Using outer join, all tuples in relation r or relation s or both in r and s can be
included in resulting relations
• There are three kinds of outer join
Left outer
Right outer
Full outer
• Outer join was developed to take union of tuples from both the relations if
relations are not union compatible. Relations are partially compatible
• Outer join can be used to avoid loss of information
Left Outer Join
• Left outer r s
all the tuples of the left relation r are included in the resulting relation and if there
exists tuples t in r without matching tuple in s then s-attributes of t are made NULL
in the resulting relation
r s
Example of Outer Join
Relation : S Relation : F
• T1= S F
SID Sname Age Dept EMPID EID Ename Sex
101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102 4P102 Joydeep M
r s
Right Outer Join Example
• T1= S F
SID Sname Age Dept EMPID EID Ename Sex
101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102 4P102 Joydeep M
104 Ronita 20 ME 4P103 4P103 Ankita F
NULL NULL NULL NULL NULL 4P104 Tanmay M SID Sname Ename
101 David Alex
102 Joy Joydeep
Find SID, name and the corresponding
supervisor name if any 104 Ronita Ankita
NULL NULL Tanmay
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝜋𝑆𝐼𝐷,𝑆𝑛𝑎𝑚𝑒,𝐸𝑛𝑎𝑚𝑒 (T1)
Full Outer Join
All the tuples in both the relations r and s are in the result and if there no matching
tuples for both relation, their respective unmatched attributes are made NULL
r s
Full Outer Join
• T1 = S F
SID Sname Age Dept EMPID EID Ename Sex
101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102 4P102 Joydeep M
104 Ronita 20 ME 4P103 4P103 Ankita F
NULL NULL NULL NULL NULL 4P104 Tanmay M
SID Sname Ename
103 Rohit 20 CSE NULL NULL NULL NULL
101 David Alex
105 Anu 19 CHE NULL NULL NULL NULL
102 Joy Joydeep
104 Ronita Ankita
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝜋𝑆𝐼𝐷,𝑆𝑛𝑎𝑚𝑒,𝐸𝑛𝑎𝑚𝑒 (T1) NULL NULL Tanmay
103 Rohit NULL
105 Anu NULL
Extended Relational-Algebra-Operations
• Generalized Projection
• Aggregate Functions
Aggregate Functions and Operations
• Aggregation function takes a collection of values and returns a single value as a result
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Aggregate operation in relational algebra
G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E)
Employee
Group the employees based on department number
EID DNO HOD
Retrieve no of employees in each department
4P101 CSE X
𝐷𝑁𝑂g𝑐𝑜𝑢𝑛𝑡 (𝐸𝐼𝐷) (𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒) 4P102 CSE X
4P103 ME Y
4P104 IT Z
4P105 ME Y
4P106 IT Z
DNO Count (EID)
CSE 2
IT 2
ME 2
Renaming and Aggregation
• Result of aggregation does not have a name
• Can use rename operation to give it a name
𝐷𝑁𝑂 g𝑐𝑜𝑢𝑛𝑡 𝐸𝐼𝐷 𝑎𝑠 𝑁𝑜 𝑜𝑓 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒 (𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒)
R DNO No of employee
𝜌𝑅( 𝐷𝑁𝑂,𝑁𝑜 𝑜𝑓 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒) CSE 2
IT 2
ME 2
Another Example
• It is possible for tuples to have a null value, denoted by null, for some of their
attributes
• null signifies an unknown value or that a value does not exist
• The result of any arithmetic expression involving null is null
• Aggregate functions simply ignore null values
• For duplicate elimination and grouping, null is treated like any other value, and
two nulls are assumed to be the same
NULL Value
• Comparisons with null values return the special truth value unknown
• If false was used instead of unknown, then not (P < 5)
would not be equivalent to P >= 5
• Three-valued logic using the truth value unknown:
• OR: (unknown or true) = true
(unknown or false) = unknown
(unknown or unknown) = unknown
• AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
• NOT: (not unknown) = unknown
• Result of select predicate is treated as false if it evaluates to unknown