0% found this document useful (0 votes)
2 views

DBMS NOTES(Relational Model)

The document discusses the relational model in database management systems, focusing on the importance of keys such as super keys, candidate keys, primary keys, and foreign keys in uniquely identifying records and establishing relationships between tables. It also outlines various constraints in DBMS, including domain, tuple uniqueness, and referential integrity constraints, which ensure data correctness. Additionally, the document presents Codd's 12 rules that define the requirements for a relational database management system.

Uploaded by

amanmalikup786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS NOTES(Relational Model)

The document discusses the relational model in database management systems, focusing on the importance of keys such as super keys, candidate keys, primary keys, and foreign keys in uniquely identifying records and establishing relationships between tables. It also outlines various constraints in DBMS, including domain, tuple uniqueness, and referential integrity constraints, which ensure data correctness. Additionally, the document presents Codd's 12 rules that define the requirements for a relational database management system.

Uploaded by

amanmalikup786
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Database Management System

BCAC 0020
Topic: Relational Model

Presented by: Sanjiv Agrawal


Assistant Professor
Computer Engineering & Applications Department,
GLA University, Mathura

[email protected], +91-8923015116
Keys
• Keys play an important role in the relational database.
• It is used to uniquely identify any record or row of data from
the table.
• It is also used to establish and identify relationships between
tables.

• For example: In Student table, ID is used as a key because it is


unique for each student. In PERSON table, passport_number,
license_number, SSN are keys since they are unique for each
person.
Types of Key
1. Super Key
• Super key is a set of attributes which can uniquely identify a tuple.
• Super key is a superset of a candidate key.
• A super key may have extraneous attributes.
• Adding zero or more attributes to the candidate key generates the
super key.
• A candidate key is a super key but vice versa is not true.
• Super Key values may also be NULL.
• For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME) the name of two employees can be the same,
but their EMPLYEE_ID can't be the same. Hence, this combination
can also be a key.
• The super key would be EMPLOYEE-ID, (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
• A relational model with N attributes have maximum 2N-1
number of super keys.
2. Candidate Key
• A candidate key is a set of attributes which can uniquely identify a
tuple.
• An appropriate candidate key will be selected as primary key.
• Remaining keys except primary key are considered as alternate or
secondary keys.
• The candidate keys are as strong as the primary key.
• It is a minimal super key.
• It can contain NULL values.
• Every table must have at least a single candidate key.
• A table can have multiple candidate keys but only one primary
key.
• For example: In the EMPLOYEE table, id is best suited for the
primary key. Rest of the attributes like SSN, Passport_Number, and
License_Number, etc. are considered as a candidate key.
3. Primary Key
• It is used to identify one and only one instance of an
entity uniquely.
• An entity can contain multiple keys.
• The key which is most suitable from those lists become
a primary key.
• A primary key can’t be NULL.
• Primary keys are not necessarily to be a single column;
more than one column can also be a primary key for a
table.
• In the EMPLOYEE table, ID can be primary key since it is
unique for each employee. In the EMPLOYEE table, we
can even select License_Number and Passport_Number
as primary key since they are also unique.
Relationship between Super Key,
Candidate Key and Primary Key
4. Secondary or Alternate Key
• Secondary Key in DBMS is a column or a set of
columns that uniquely identify each row in a table.
• That is not the primary key.
• Secondary keys are also called alternate keys as they
can also be used alternatively in place of the primary
key.
5. Unique Key
• Unique Key is a column or set of columns that uniquely
identify each record in a table.
• All values will have to be unique in this Key.
• A unique Key differs from a primary key because it can
have null value, whereas a primary Key cannot have
any null values.
6. Composite Key
• If any single attribute of a table is not capable of being the key.
• It cannot identify a row uniquely,
• Then we combine two or more attributes to form a key.
• This is known as a composite key.

7. Foreign Key
• It is a key that acts as a primary key in one table and acts
as secondary key in another table.
• The foreign key is useful in linking.
• It combines two or more relations (tables) at a time.
• They act as a cross-reference between the tables.
• Data should be entered in the foreign key column with great
care.
• As wrongly entered data can invalidate the relationship
between the two tables.
8. Surrogate Key
• A surrogate key is a database key that is not derived
from the natural key of the table.
• It is a unique identifier that is generated by the
database system.
• Surrogate keys are often used when the natural key
is not unique or when it is subject to change.
• In other words, a surrogate key is a type of primary
key.
9. Partial Key
• Partial key is a key using which all the records of the
table can not be identified uniquely.
• The set of attributes that are used to uniquely identify
a weak entity set is called the Partial key.
• Only a bunch of the tuples can be identified using the
partial keys.
• The partial Key of the weak entity set is also known as
a discriminator.
• It is just a part of the key as only a subset of the
attributes can be identified using it.
• It is partially unique and can be combined with other
strong entity set to uniquely identify the tuples.
Differences between Primary Key and
Secondary Key
Basis Primary Key Secondary Key
A key that is unique, not A key that uniquely
null, and is selected by identifies rows but is
the database not selected as the
Definition
administrator to uniquely primary key is known as
identify tuples is called a secondary key or
the primary key. alternate key.
NULL
It cannot be NULL. It can be NULL.
values

Number of There must be one and There can be zero or


keys only one primary key. more secondary keys.
Prime and Non Prime Attributes
Prime or Key Attributes
Attributes of the relation[Table] which exist in at least
one of the possible candidate keys,  called prime or
key attributes.
Non Prime or Non Key Attributes:
Attributes of the relation which does not exist in any of
the possible candidate keys of the relation,
such attributes are called non prime or non
key attributes.
Relational model concepts
• Relational data model is the primary data model,
which is used widely around the world for data
storage and processing.
• This model is simple and it has all the properties
and capabilities required to process data with
storage efficiency
Basic concepts of relational
data model
• Tables − In relational data model, relations are saved in the
format of Tables. This format stores the relation among entities.
A table has rows and columns, where rows represents records
and columns represent the attributes.
• Tuple − A single row of a table, which contains a single record
for that relation is called a tuple.
• Relation instance − A finite set of tuples in the relational
database system represents relation instance. Relation instances
do not have duplicate tuples.
• Relation schema − A relation schema describes the relation
name (table name), attributes, and their names.
• Relation key − Each row has one or more attributes, known as
relation key, which can identify the row in the relation (table)
uniquely.
• Attribute domain − Every attribute has some pre-defined value
scope, known as attribute domain.
Constraints in DBMS
• Relational constraints are the restrictions imposed on the
database contents and operations.
• They ensure the correctness of data in the database.
Types of Constraints in DBMS
1. Domain Constraint
• Domain constraint defines the domain or set of values for an
attribute.
• It specifies that the value taken by the attribute must be the
atomic value from its domain.
Example
STU_ID Name Age
S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
S004 Rahul A
Here, value ‘A’ is not allowed since only integer values can be taken by the age
attribute.
2. Tuple Uniqueness Constraint

• Tuple Uniqueness constraint specifies that all the tuples must


be necessarily unique in any relation.

STU_ID Name Age STU_ID Name Age


S001 Akshay 20 S001 Akshay 20
S002 Abhishek 21 S001 Akshay 20
S003 Shashank 20 S003 Shashank 20
S004 Rahul 20 S004 Rahul 20
3. Key Constraint
• Key constraint specifies that in any relation-
– All the values of primary key must be unique.
– The value of primary key must not be null.

STU_ID Name Age


S001 Akshay 20
S001 Abhishek 21
S003 Shashank 20
S004 Rahul 20
4. Entity Integrity Constraint
• Entity integrity constraint specifies that no attribute of
primary key must contain a null value in any relation.
• This is because the presence of null value in the primary key
violates the uniqueness property.

STU_ID Name Age


S001 Akshay 20
S002 Abhishek 21
S003 Shashank 20
Rahul 20
5. Referential Integrity Constraint
• This constraint is enforced when a foreign key references the
primary key of a relation.
• It specifies that all the values taken by the foreign key
must either be available in the relation of the primary
key or be null.

Important Results
– We can not insert a record into a referencing relation if the
corresponding record does not exist in the referenced
relation.
– We can not delete or update a record of the referenced
relation if the corresponding record exists in the
referencing relation.
• Here, relation ‘Student’ references the relation ‘Department’.

STU_ID Name Dept_no Dept_no Dept_name


S001 Akshay D10 D10 ASET
S002 Abhishek D10 D11 ALS
S003 Shashank D11 D12 ASFL
S004 Rahul D14 D13 ASHS
Referential Integrity Constraint Violation

• There are following three possible causes of violation of


referential integrity constraint-

• Cause-01: Insertion in a referencing relation


• Cause-02: Deletion from a referenced relation
• Cause-03: Updation in a referenced relation
Cause-01: Insertion in a Referencing Relation

• It is allowed to insert only those values in the referencing


attribute which are already present in the value of the
referenced attribute.

Roll_no Name Age Branch_Code Branch_Code Branch_Name


1 Rahul 22 CS CS Computer Science
EE Electronics Engineering
2 Anjali 21 CS
IT Information Technology
3 Teena 20 IT
CE Civil Engineering
Cause-01: Insertion in a Referencing Relation

Roll_no Name Age Branch_Code Branch_Code Branch_Name


1 Rahul 22 CS CS Computer Science

2 Anjali 21 CS EE Electronics Engineering

3 Teena 20 IT IT Information Technology


4 James 23 ME CE Civil Engineering
Cause-02: Deletion from a Referenced Relation

• It is not allowed to delete a row from the referenced relation if


the referencing attribute uses the value of the referenced
attribute of that row.

Branch_Code Branch_Name
Roll_no Name Age Branch_Code
CS Computer Science
1 Rahul 22 CS EE Electronics Engineering

2 Anjali 21 CS IT Information Technology

3 Teena 20 IT CE Civil Engineering


• To handle this we can simultaneously delete those tuples from
the referencing relation where the referencing attribute uses
the value of referenced attribute being deleted.
• This method of handling the violation is called as On Delete
Cascade.
OR
• This method involves aborting or deleting the request for a
deletion from the referenced relation if the value is used by
the referencing relation.
Cause-03: Updation in a Referenced Relation

• It is not allowed to update a row of the referenced relation if


the referencing attribute uses the value of the referenced
attribute of that row.

Roll_no Name Age Branch_Code Branch_Code Branch_Name


CSE Computer Science
1 Rahul 22 CS
EE Electronics Engineering
2 Anjali 21 CS
IT Information Technology
3 Teena 20 IT
CE Civil Engineering
• We can simultaneously updating those tuples of the
referencing relation where the referencing attribute uses the
referenced attribute value being updated.
OR
• This method involves aborting or deleting the request for an
updation of the referenced relation if the value is used by the
referencing relation.
• We can simultaneously updating those tuples of the
referencing relation where the referencing attribute uses the
referenced attribute value being updated.
OR
• This method involves aborting or deleting the request for an
updation of the referenced relation if the value is used by the
referencing relation.
12 Codd's Rules
• If any database has only relational data model, it
cannot be a Relational Database System (RDBMS).
• So, some rules define a database to be the correct
RDBMS.
• These rules were developed by Dr. Edgar F. Codd
(E.F. Codd) in 1985.
• These 13 rules are popular in RDBMS, known
as Codd's 12 rules.
12 Codd's Rules
Rule 0: The Foundation Rule
The database must be in relational form. So that the system can handle the
database through its relational capabilities.
Rule 1: The Information Rule
All information, whether it is user information or metadata, that is stored in
a database must be entered as a value in a cell of a table. It is said that
everything within the database is organized in a table layout.
Rule 2: The Guaranteed Access Rule
Each data element is guaranteed to be accessible logically with a
combination of the table name, primary key (row value), and attribute name
(column value).
Rule 3: Systematic Treatment of NULL Values
Every Null value in a database must be given a systematic and uniform
treatment.
12 Codd's Rules
Rule 4: Active Online Catalog Rule
The database catalog, which contains metadata about the
database, must be stored and accessed using the same relational
database management system.

Rule 5: The Comprehensive Data Sublanguage Rule


A crucial component of any efficient database system is its ability to
offer an easily understandable data manipulation language (DML)
that facilitates defining, querying, and modifying information within
the database.

Rule 6: The View Updating Rule


All views that are theoretically updatable must also be updatable
by the system.
12 Codd's Rules
Rule 7: High-level Insert, Update, and Delete
A successful database system must possess the feature of
facilitating high-level insertions, updates, and deletions that can
grant users the ability to conduct these operations with ease
through a single query.

Rule 8: Physical Data Independence


Application programs and activities should remain unaffected when
changes are made to the physical storage structures or methods.

Rule 9: Logical Data Independence


Application programs and activities should remain unaffected when
changes are made to the logical structure of the data, such as
adding or modifying tables.
12 Codd's Rules
Rule 10: Integrity Independence
Integrity constraints should be specified separately from
application programs and stored in the catalog. They should be
automatically enforced by the database system.

Rule 11: Distribution Independence


The distribution of data across multiple locations should be invisible
to users, and the database system should handle the distribution
transparently.

Rule 12: Non-Subversion Rule


If the interface of the system is providing access to low-level
records, then the interface must not be able to damage the system
and bypass security and integrity constraints.
Relational Algebra Overview
● Relational algebra is the basic set of operations for the
relational model.
● These operations enable a user to specify basic retrieval
requests (or queries)
● It consists of a set of operations that take one or two
relations as input and produce a new relation as its
output (result).
● A sequence of relational algebra operations forms a
relational algebra expression
● The result of a relational algebra expression is also a
relation that represents the result of a database query
(or retrieval request)
14
Relational Algebra Overview
The Relational Algebra is a procedural query language.

Query Language:
● A Language which is used to store and retrieve data
from database is known as query language. For
example – SQL Query
Language

● There are two types of


query language: Procedural Non-Procedural
1.Procedural Query language Query Language Query Language
2.Non-procedural query language

14
Query
1. Procedural Query language: Language
In procedural query language, user instructs
the system to perform a series of operations to
produce the desired results. Here users tells Procedural Non-
Procedural
what data to be retrieved from database and Query
Language Query
how to retrieve it. Language

2. Non-procedural query language:


Relational
In Non-procedural query language, user Relational
Calculus /
Algebra
instructs the system to produce the desired SQL
result without telling the step by step process.
Here users tells what data to be retrieved from
database but doesn’t tell how to retrieve it.
Note : PL/SQL
 Relational algebra and calculus are the theoretical concepts used on
relational model.
 SQL is a practical implementation of relational algebra and calculus. It is a
non-procedural language. 14
Extended RA Operations

• Generalized Project ()


• Aggregate Functions (G)
Relational Algebra (RA) Operations
Fundamental Operations
• The select, project, and rename operations are called unary
operations, because they operate on one relation.

• The other three operations operate on pairs of relations and are,


therefore, called binary operations.

14
Database State for COMPANY
All examples discussed further will refer to the COMPANY database shown
here.
Unary Relational Operations: SELECT
The SELECT operation (denoted by  (sigma)) is used to
select a subset of the tuples from a relation based on a
selection condition.
•The selection condition acts as a filter
•P (r) : Keeps only those tuples from relation r, which
satisfy predicate/condition P.
•The predicate P will involve:-
•Attributes from Schema R of r
•Comparison Operators: <, > , < , >, =, 
•Logical Operator:  (AND) ,  (OR) ,  (NOT)
Unary Relational Operations: SELECT

Examples:

1. Select the EMPLOYEE tuples whose department


number is 4:
 Dno=4 (EMPLOYEE)

2. Select the employee tuples whose salary is greater


than $30,000:
 SALARY > 30,000 (EMPLOYEE)
Unary Relational Operations: SELECT
3. Select those tuples pertaining to loans of more than
$1200 made by the Perryridge branch,
σbranch-name = “Perryridge” ∧ amount>1200 (loan)
loan
Unary Relational Operations: SELECT
SELECT Operation Properties :
1. The SELECT operation  <selection condition>(R) produces a relation S
that has the same schema (same attributes) as R
2. SELECT  is commutative:
 <condition1>( < condition2> (R)) =  <condition2> ( < condition1> (R))
3. Because of commutativity property, a cascade (sequence) of
SELECT operations may be applied in any order:
 <cond1>( <cond2> ( <cond3> (R)) =  <cond2> ( <cond3> ( <cond1> ( R)))
4. A cascade of SELECT operations may be replaced by a single
selection with a conjunction of all the conditions:
 <cond1>( < cond2> ( <cond3>(R)) =  <cond1> AND < cond2> AND < cond3>(R)))
5. The number of tuples in the result of a SELECT is less than (or
equal to) the number of tuples in the input relation R
Unary Relational Operations: PROJECT
PROJECT Operation is denoted by  (pi). This operation keeps
certain columns (attributes) from a relation and discards the other
columns.
•PROJECT creates a vertical partitioning.
•The general form of the project operation is:  <attribute list>(R)
 (pi) is the symbol used to represent the project operation.
<attribute list> is the desired list of attributes from relation R.
•The project operation removes any duplicate tuples. This is because
the result of the project operation must be a set of tuples.
Mathematical sets do not allow duplicate elements.
•Example: To list each employee’s first and last name and salary, the
following is used:

Fname,Lname,Salary(EMPLOYEE)
Unary Relational Operations: PROJECT
PROJECT Operation Properties
• The number of tuples in the result of projection <list>(R) is
always less or equal to the number of tuples in R.

• If the list of attributes includes a key of R, then the number of


tuples in the result of PROJECT is equal to the number of tuples
in R.

• PROJECT is not commutative.

•  <list1> ( <list2> (R) ) =  <list1> (R) as long as <list2> contains the


attributes of <list1>.
Unary Relational Operations: Examples

σ (Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE)

Output
??
Unary Relational Operations: Examples

Output
 Lname, Fname, Salary(EMPLOYEE) ??
Unary Relational Algebra Expressions:

•We may want to apply several relational algebra


operations one after the other.
•Either we can write the operations as a single relational
algebra expression by nesting the operations, or
•We can apply one operation at a time and create
intermediate result relations.
•In the latter case, we must give names to the relations
that hold the intermediate results.
Single expression versus sequence of
relational operations (Example):
Example: To retrieve the first name, last name, and salary of all
employees who work in department number 5,

• We must apply a select and a project operation


• We can write a single relational algebra expression also known as
an in-line expression, as follows:

 FNAME, LNAME, SALARY( DNO=5(EMPLOYEE))


Single expression versus sequence of
relational operations (Example):
OR We can explicitly show the sequence of operations,
giving a name to each intermediate relation:
•DEP5_EMPS   DNO=5(EMPLOYEE)

TEMP

•RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)

R
Unary Relational Operations: RENAME
•The RENAME operator is denoted by  (rho)
•In some cases, we may want to rename the attributes of a relation
or the relation name or both.
•Useful when a query requires multiple operations
•Necessary in some cases (see JOIN operation later)
•The general RENAME operation  can be expressed by any of
the following forms:
• S (B1, B2, …, Bn )(R) changes both:
the relation name to S, and
the column (attribute) names to B1, B2, …..Bn
• S(R) changes:
the relation name only to S
• (B1, B2, …, Bn )(R) changes:
the column (attribute) names only to B1, B2, …..Bn
Unary Relational Operations: RENAME
Example:
 DEP5_EMPS ( DNO=5(EMPLOYEE))

DEP5_EMPS

 RESULT (First_name, Last_name, Salary) ( FNAME, LNAME, SALARY (DEP5_EMPS))

RESULT
Relational Algebra Operations from Set
Theory :
•Type Compatibility of operands is required for the binary
set operation UNION ∪ , (also for INTERSECTION ∩, and
SET DIFFERENCE –, see next slides)
•R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type
compatible if:
•they have the same number of attributes, and
•the domains of corresponding attributes are type
compatible
(i.e. dom(Ai)=dom(Bi) for i=1, 2, ..., n).
•The resulting relation for R1 ∪ R2 (also for R1 ∩ R2, or R1–
R2, see next slides) has the same attribute names as the
first operand relation R1 (by convention)
Relational Algebra Operations from Set Theory:
UNION
UNION Operation
•Binary operation, denoted by 
•The result of R  S, is a relation that includes all tuples that
are either in R or in S or in both R and S
•Duplicate tuples are eliminated
•The two operand relations R and S must be “type
compatible” (or UNION compatible):
 R and S must have same number of attributes
 Each pair of corresponding attributes must be type
compatible (have same or compatible domains)
Relational Algebra Operations from Set
Theory : INTERSECTION
•INTERSECTION is denoted by ∩
•The result of the operation R ∩ S, is a relation that includes
all tuples that are in both R and S
•The attribute names in the result will be the same as the
attribute names in R
•The two operand relations R and S must be “type
compatible”
Relational Algebra Operations from Set
Theory : SET DIFFERENCE
•SET DIFFERENCE (also called MINUS or EXCEPT) is denoted
by –
•The result of R – S, is a relation that includes all tuples that
are in R but not in S
•The attribute names in the result will be the same as the
attribute names in R
•The two operand relations R and S must be “type
compatible”
Relational Algebra Operations from Set
Theory :
Example: The set operations
UNION, INTERSECTION, (b) STUDENT ∪ INSTRUCTOR.
and MINUS. (c) STUDENT ∩ INSTRUCTOR.
(d) STUDENT – INSTRUCTOR.
(a) Two union- (e) INSTRUCTOR – STUDENT
compatible relations.
Properties of UNION, INTERSECT, and
DIFFERENCE
• Notice that both union and intersection are commutative
operations; that is
R ∪ S = S ∪ R, and R ∩ S = S ∩ R
• Both union and intersection can be treated as n-ary
operations applicable to any number of relations as both
are associative operations; that is
R ∪ (S ∪ T) = (R ∪ S) ∪ T
(R ∩ S) ∩ T = R ∩ (S ∩ T)
• The minus operation is not commutative; that is, in
general
R–S≠S–R
The Relational Algebra-II (JOINS)
1. Cartesian Product (Cross Join)
2. Join Operations
i. Inner Join
a. Theta Join
b. EQUI Join
c. Natural Join
ii. Outer Join
a. Left Outer Join
b. Right Outer Join
c. Full Outer Join 69
Introduction – Cartesian Product
• Cartesian Product in DBMS is an operation used to
combine tuples from two relations in a combinatorial
fashion.
• It is also called Cross Product or Cross Join.
• Very rare in practice; mainly used to express joins.
• Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm).
• If relation R1 has n tuples and relation R2 has m tuples ,
total tuples in R1  R2 will be n * m.
• Result is a relation Q with degree n + m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
• The two operands do NOT have to be "type compatible”.

70
Cartesian Product (Contd.)
A B C D E A B C D E
α 10 a α 1 α 10 a
α 1 X A B a α 1 A B a
α 2
α 1 b α 1 α 1 b
α 2 b α 1 α 2 b
α 2 α 10 a
α 2 A B a
α 2 α 1 b
α 2 α 2 b
Join Operation
The sequence of CARTESIAN PRODUCT followed by SELECT
is used quite commonly to identify and select related tuples
from two relations. A special operation, called JOIN
combines this sequence into a single operation

 This operation is very important for any relational


database with more than a single relation, because it
allows us combine related tuples from various relations

 The general form of a join operation on two relations


R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:
R <join condition>S

where R and S can be any relations that result from


general relational algebra expressions.
72
Properties of JOIN
Consider the following JOIN operation:
R(A1, A2, . . ., An) R.Ai=S.Bj S(B1, B2, . . ., Bm)

Result is a relation Q with degree n + m attributes:


Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
• The resulting relation state has one tuple for each
combination of tuples—r from R and s from S, but only if
they satisfy the join condition r[Ai]=s[Bj]
• Hence, if R has nR tuples, and S has nS tuples, then the
join result will generally have less than nR * nS tuples.
• Only related tuples (based on the join condition) will
appear in the result

Slide 6-
Join Operation (Contd.)
Example: Suppose that we want to retrieve the name of the
manager of each department.
 To get the manager’s name, we need to combine each
DEPARTMENT tuple with the EMPLOYEE tuple whose SSN
value matches the MGRSSN value in the department
tuple.
 We do this by using the join operation.

 DEPT_MGR  DEPARTMENT MGRSSN=SSN EMPLOYEE

 MGRSSN=SSN is the join condition


 Combines each department record with the employee
who manages the department
 The join condition can also be specified as
DEPARTMENT.MGRSSN= EMPLOYEE.SSN
74
Types of Joins
There are mainly two types of joins in DBMS:
1. Inner Joins:
a) Theta
b) Natural
c) EQUI
2. Outer Join:
a) Left
b) Right
c) Full
Theta Join
THETA JOIN allows you to merge two tables based on the
condition represented by theta (θ).
It is the most general Join Operation.
It is the result of performing a SELECT operation on the product.
Theta joins work for all comparison operators.
It is denoted by symbol θ.
The general form of a theta join operation on two relations
R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:
θ

where R and S can be any relations that result from general


relational algebra expressions.
 Algebraic Rule:

Physically, the join of R1 and R2 with the condition theta is the same than the
selection of the cross product of R1 and R2 with the condition theta.
Theta Join
Example: Display the name of the customer along with the age group in
which his/her age lies.

πcustomer_name, age_group(σage >= min_age ^ age <= max_age(customer X age_group))


EQUI JOIN
• The most common use of join involves join conditions with
equality comparisons only.

• EQUI JOIN is done when a Theta join uses only the


equivalence condition.

• EQUI join is the most difficult operation to implement


efficiently in an RDBMS, and one reason why RDBMS have
essential performance problems.

• Such a join, where the only comparison operator (=) used is


called an EQUIJOIN.

• In the result of an EQUIJOIN we always have one or more


pairs of attributes (whose names need not be identical) that
have identical values in every tuple.
Relational Syntax:
EQUIJOIN
Example: Show the details of the student and the subject registered by him.
Natural Join
Another variation of JOIN called NATURAL JOIN — denoted by * —
was created to get rid of the second (superfluous) attribute in an
EQUIJOIN condition.
because one of each pair of attributes with identical values is
superfluous.

The standard definition of natural join requires that the two join
attributes, or each pair of corresponding join attributes, have
the same name in both relations.
If this is not the case, a renaming operation is applied first.
We can perform a Natural Join only if there is at least one common
attribute that exists between two relations.
A natural join is a shorthand equi-join where the equality applied to
all shared attributes.

Here: attrs(R)∩attrs(S) is the intersection of the attributes of R and S.


Natural Join
Example 1: To apply a natural join on the DNUMBER attributes of
DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write:

DEPT_LOCS  DEPARTMENT * DEPT_LOCATIONS

Only attribute with the same name is DNUMBER.

An implicit join condition is created based on this attribute:

DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER

Example 2: Q  R(A,B,C,D) * S(C,D,E)


The implicit join condition includes each pair of attributes with the
same name, “AND”ed together:
R.C=S.C AND R.D=S.D
Result keeps only one attribute of each such pair:
Q(A,B,C,D,E) 83
Natural Join

Courses * HoD
Outer Join Overview
The Outer Join operation is an extension of the JOIN operation to
deal with missing values. Let’s consider two table :

employee ft-works
Objective: Generate a single relation with all the information
(employee-name, street, city, branch name, and salary) about full-
time employees.
Solution: Possible approach is Natural join operation .

Output:

But we have lost the street and city information about Smith. Similarly, we
have lost the branch name and salary information about Gates.
14
Outer Join contd..

We can use the outer-join operation to avoid this loss of information.


There are actually three forms of the operation:
Outer join
●Left outer join, denoted as :
●Right outer join, denoted as :
●Full outer join, denoted as :
Left outer Right outer
join join

Full outer
join
All three forms of outer join compute the join, and add extra tuples
to the result of the join.

14
Left outer join ( )
The left outer join ( ) takes all tuples in the left relation that
did not match with any tuple in the right relation, pads the
tuples with null values for all other attributes from the right
relation, and adds them to the result of the natural join.

employee ft-works
Objective :
Results:
All tuples
from left
table

14
Right outer join ( )
The right outer join ( ) takes all tuples in the right relation
that did not match with any tuple in the left relation, pads the
tuples with null values for all other attributes from the left
relation, and adds them to the result of the natural join.

employee ft-works

Objective :
Results:
All tuples
from right
table

14
Full outer join ( )
The full outer join ( ) does both of those operations, padding
tuples from the left relation that did not match any from the
right relation, as well as tuples from the right relation that did
not match any from the left relation, and adding them to the
result of the join.

employee ft-works

Objective :
Results:
All tuples
from both
tables

14
Can we answer?
● Using following table how many number of tuples will be
returned for the following relational algebra expression:

A. 10
B. 15
C. 11
D. 12
Binary relation operation: DIVISION
DIVISION Operation:
● The division operation is applied to two relations .
● R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X Ư Y);
that is, let Y be the set of attributes of R that are not attributes of
S.
● The result of DIVISION is a relation T(Y) that includes a tuple if
tuples appear in R with , and with.

● For a tuple t to appear in the result T of the DIVISION, the values


in t must appear in R in combination with every tuple in S.

14
Example of DIVISION operation

14
Can we answer of Division A/B ?

sno pno pno pno pno


s1 p1 p2 p2 p1
s1 p2 p4 p2
s1 p3
B1 p4
s1 p4
B2
s2 p1 sno
B3
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1 sno
s4 p4 s4 s4 s1
A A/B1 A/B2 A/B3
Write RA using Division operator?

Q1. List of courses in which all students are registered.

Q2. List of courses in which all ‘ECMP’ major students


are registered.
Aggregate Functions
● A type of request that cannot be expressed in the basic
relational algebra is to specify mathematical aggregate
functions on collections of values from the database.
● Examples of such functions include retrieving the average or
total salary of all employees or the total number of employee
tuples.
○ These functions are used in simple statistical queries that
summarize information from the database tuples.
● Common functions applied to collections of numeric values
include
○ SUM, AVERAGE, MAXIMUM, and MINIMUM.

● The COUNT function is used for counting tuples or values.

14
Basic Aggregate functions operations
Use of the Aggregate Functional operation :
●Retrieves the maximum salary value from the EMPLOYEE relation

○ MAX Salary (EMPLOYEE).


● Retrieves the minimum Salary value from the EMPLOYEE relation.

○ MIN Salary (EMPLOYEE).


● Retrieves the sum of the Salary from the EMPLOYEE relation.

○ SUM Salary (EMPLOYEE)


● Computes the count (number) of employees and their average salary

○ COUNT SSN, AVERAGE Salary (EMPLOYEE)

■ Note: count just counts the number of rows, without removing duplicates

14

You might also like