Dbms Unit III 2 56
Dbms Unit III 2 56
Unit-III
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query, UNION,
INTERSECT, and EXCEPT, Nested Queries, aggregation operators, NULL values, complex
integrity constraints in SQL, triggers and active data bases.
Schema Refinement: Problems caused by redundancy, decompositions, problems related to
decomposition, reasoning about functional dependencies, FIRST, SECOND, THIRD normal
forms, BCNF, lossless join decomposition, multi-valued dependencies, FOURTH normal form,
FIFTH normal form.
SQL: SQL (structured query language) is most widely used commercial database language. It
was developed by IBM in 1970’s, it has several aspects:
DDL(Data Definition Language): this subset of SQL supports the creation, deletion,
modification of tables and views, allows to specify the integrity constraints on tables.
DML(Data Manipulation Language): it allows the user to pose the queries and to insert, delete
and modify the rows on a table.
Embedded and Dynamic SQL: Embedded SQL features allows user to call SQL code from a
host language such as C or COBOL. Dynamic SQL allows a query to construct at runtime.
Trigger: the new SQL1999 standard supports the triggers. Which are the actions executed by
DBMS whenever changes to the database meet the conditions specified in the trigger.
Security: SQL provides the mechanisms to control access to data objects such as tables and
views.
Transaction management: it allows a user to explicitly control how a transaction is to be
executed.
SQL is a programming language for Relational Databases. It is designed over relational algebra
and tuple relational calculus. SQL comprises both data definition and data manipulation
languages. Using the data definition properties of SQL, one can design and modify database
schema, whereas data manipulation properties allows SQL to store and retrieve data from
database.
The basic form of SQL query is
SELECT [Distinct] Select-list
FROM From-list
WHERE Condition;
Example: SQL Query for to get the names of students who got marks above 60.
SELECT sname
From student
Where marks>60
This query corresponds to a relational algebra expression involves selection, projection, and
cross product.
The SELECT clause specifies which columns to be retained in the result.
Select-list: it specifies the list of column names.
The FROM clause specifies the cross product of tables
From-list: list of table names.
WHERE clause specifies the selection condition on the tables mentioned in the FROM clause.
Conceptual evaluation strategy:
Computes the cross product of tables in the from-list.
Deletes the rows from the cross product that fails the condition.
Delete the column that does not appear in the select-list.
Eliminates the duplicate rows.
6. Find the Names of sailors who have reserved at least one boat?
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid= R.sid;
7. Find the names of sailors who have reserved at least two boats on same day?
SELECT S.sname
FROM Sailors S, Reserves R1, Reserves R2
WHERE (S.sid=R1.sid AND R1.sid=R2.sid AND R1.bid < > R2.bid AND R1.day=
R2.day);
8. Find the ages of sailors whose name begins and ends with B and has at least three characters.
SELECT S.age
FROM Sailors S
WHERE S.sname LIKE `B %B'.
9. Compute increments for the ratings of persons who have sailed two different boats on the
same day.
SELECT S.sname, S.rating+1 AS rating
FROM Sailors S, Reserves R1, Reserves R2
WHERE S.sid = R1.sid AND S.sid = R2.sid
AND R1.day = R2.day AND R1.bid <> R2.bid
Set operators:
SQL provides three set manipulation constructs (UNION, INTERSECTION, EXCEPT) that
extend the basic form of sql query, these constructs are used when answer to query is multiset of
rows. SQL also provides other set operations: IN, op ANY, op ALL, EXIST, NOT IN, NOT
EXIST. These operators are used to join the results of two (or more) SELECT statements
UNION: this UNION operator, it returns the combined result from all the compounded
SELECT queries, after removing all duplicates and in sorted order (ascending by default),
without ignoring the NULL values.
Example: Find the names of sailors who have reserved a red or a green boat.
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = `red'
UNION
SELECT S2.sname
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = `green'.
Example: Find all Sid’s of sailors who have a rating of 10 or have reserved boat 104.
SELECT S.sid
FROM Sailors S
WHERE S.rating = 10
UNION
SELECT R.sid
FROM Reserves R
WHERE R.bid = 104
INTERSECT: INTERSECT operator, displays the common rows from both the SELECT
statements, with no duplicates and data arranged in sorted order.
Example: Find the names of sailors who have reserved a red and a green boat.
SELECT S.sname
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = `red'
INTERSECT
SELECT S2.sname
FROM Sailors S2, Boats B2, Reserves R2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = `green'
EXCEPT: It displays the rows which are present in the first query but not presented in the
second query, with no duplicates and data arranged in ascending order
Example: Find the Sid’s of all sailors who have reserved red boats but not green boats.
SELECT S.sid
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = `red'
EXCEPT
SELECT S2.sid
FROM Sailors S2, Reserves R2, Boats B2
WHERE S2.sid = R2.sid AND R2.bid = B2.bid AND B2.color = `green'
Nested Queries: A nested query is a query that has another query embedded within it; the
embedded query is called a sub query. In this inner query is executed first later the outer query
will be executed. A sub query typically appears within the WHERE clause of a query, sometimes
appear in the FROM clause or the HAVING clause.
Example: Find the names of sailors who have reserved boat 103.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN (SELECT R.sid
FROM Reserves R
WHERE R.bid = 103)
Example: Find the names of sailors who have reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = `red' )
Example: Find the names of sailors who have not reserved a red boat.
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN ( SELECT R.sid
FROM Reserves R
WHERE R.bid IN ( SELECT B.bid
FROM Boats B
WHERE B.color = `red' )
Generally we will use this SOME operator in WHERE clause to check whether the required
column values are matching with the set of values returned by subquery or not.
Example: write a Query to find the sailors whose rating of is greater than 8.
Example: Find sailors whose rating is better than some sailor called Horatio.
SELECT S.sid
FROM Sailors S
WHERE S.rating > ANY (SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = `Horatio’)
The sub query computes all rating values in sailors. The outer query WHERE condition is
satisfied when S.rating is greater than or equals to each of these rating values.
Example: find the sailors with highest rating.
SELECT S.sid
FROM Sailors S
WHERE S.rating >= ALL (SELECT S2.rating
FROM Sailors S2)
Example: find the names of sailors who has reserved a red and green boat.
SELECT S3.sname
FROM Sailors S3
WHERE S3.sid IN ((SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = `red’)
INTERSECT
(SELECT R2.sid
FROM Boats B2, Reserves R2
WHERE R2.bid = B2.bid AND B2.color = `green’))
Or
SELECT S3.sname
FROM Sailors S3
WHERE S3.sid IN (( SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = `red' )
INTERSECT
(SELECT R2.sid
FROM Boats B2, Reserves R2
WHERE R2.bid = B2.bid AND B2.color = `green' ))
Example: Find the names of sailors who have reserved all boats.
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
FROM Boats B )
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid = S.sid ))
Aggregate Operators:
These operators are used to perform the calculation on the multiple rows of the column of a
table. SQL supports five aggregate operations
COUNT (A): it returns the number of (unique) values in column A.
Example: Count the number of sailors.
SELECT COUNT (*)
FROM Sailors S
Example: Count the number of different sailor names.
SELECT COUNT (DISTINCT S.snmae)
FROM Sailors S
SUM (A): it returns the sum of all the values in column A.
Example: Find the sum of ages of all sailors.
SELECT SUM (S.age)
FROM Sailors S
AVG (A): it returns the the average of all values in column A.
Example: Find the average age of sailors with a rating of 10.
SELECT AVG (S.age)
FROM Sailors S
WHERE S.rating = 10
MAX (A): it returns the maximum value in the column A.
Example: Find the name and age of the oldest sailor.
SELECT S.sname, MAX (S.age)
FROM Sailors S
Or
SELECT S.sname, S.age
FROM Sailors S
WHERE S.age = (SELECT MAX (S2.age)
FROM Sailors S2 )
MIN (A): it returns the minimum value in column A.
Ex: Find the name and age of the youngest sailor.
SELECT S.sname, MIN (S.age)
FROM Sailors S
Example: Find the names of sailors who are older than the oldest sailor with a rating of 10.
SELECT S.sname
FROM Sailors S
WHERE S.age > ( SELECT MAX ( S2.age )
FROM Sailors S2
WHERE S2.rating = 10)
Group by Having Clause:
The aggregate operators can be applied on all the rows of a relation. We want to apply aggregate
operations on groups of rows in relation then this having clause can be used.
The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.
ORDER BY:
The ORDER BY keyword is used to sort the result-set in ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by default. To sort the records in
descending order, use the DESC keyword.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Example: find the age of the youngest sailor for each rating level.
SELECT S.rating, MIN (S.age)
FROM Sailor S
Group By S.rating
HAVING SID>58
Example: find the age of the youngest sailor who is eligible to vote for each rating level with
at least two such categories in saaending order.
SELECT S.rating, MIN(S.age) AS minage
FROM Sailors S
WHERE S.age>18
GROUP BY S.rating
HAVING COUNT(*)>1
ORDER By Sid.
Example: for each red boat, find the number of reservations this boat.
SELECT B.bid, COUNT (*) AS reservation count
FROM Boats B, Reserves R
WHERE R.bid==B.bid
GROUP BY B.bid
Having B.color=’red’
Example: find those ratings for which the average age of sailors is the minimum over the
all ratings.
SELECT S.rating
FROM Sailors S
WHERE AVG (S.age) = (SELECT MIN(S2.age))
FROM Sailors S2
Group By S2.rating)
Null Values:
A field with a NULL value is a field with no value. If a field in a table is optional, it is possible
to insert a new record or update a record without adding a value to this field. Then, the field will
be saved with a NULL value.
We can disallow null values by specifying NOT NULL as part of field definition for example
Sname CHAR(20) NOT NULL. There is an implicit NOT NULL constraint for every field listed
in a PRIMARY KEY Constraint.
IS NULL and IS NOT NULL operators used to test the null values.
Syntax: IS NULL
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
Example
INNER JOIN: The INNER JOIN creates a new result table by combining column values of two
tables (table1 and table2) based upon the join-predicate.
LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table.
This means that if the ON clause matches 0 (zero) records in the right table; the join will still
return a row in the result, but with NULL in each column from the right table.
RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left
table. This means that if the ON clause matches 0 (zero) records in the left table; the join will
still return a row in the result, but with NULL in each column from the left table.
This means that a right join returns all the values from the right table, plus matched values from
the left table or NULL in case of no matching join predicate
SQL> SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00
FULL JOIN: combines the results of both left and right outer joins.
The joined table will contain all records from both the tables and fill in NULLs for missing
matches on either side.
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
FULL JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00
Example: Database contains the relation student (Sid, sname, Subject1, Subject2, Subject3,
total ), create trigger to compute total when insert the subject values into a student relation.
Create Trigger T1
BEFORE INSERT
ON Student
For each row
Set new.total = new.subject1 + new.subject2 + new.subject3;
Example: Database contains the CUSTOMERS table (id, name, age, address, salary),
creates a row-level trigger for the customers table to display the salary difference between
the old values and new values, that would fire for INSERT or UPDATE or DELETE
operations performed on the CUSTOMERS table.
Drop Trigger: we can remove the trigger on database using drop command.
Syntax: DROP TRIGGER [IF EXISTS] [schema_name.]trigger_name
Example: DROP TRIGGER [IF EXISTS] [customers] display_salary_changes
Active Databases:
Active Database is a database consisting of set of triggers. These databases are very difficult to be
maintained because of the complexity that arises in understanding the effect of these triggers. In
such database, DBMS initially verifies whether the particular trigger specified in the statement that
modifies the database is activated or not, prior to executing the statement.
If the trigger is active then DBMS executes the condition part and then executes the action part
only if the specified condition is evaluated to true. It is possible to activate more than one trigger
within a single statement.
In such situation, DBMS processes each of the trigger randomly. The execution of an action part of
a trigger may either activate other triggers or the same trigger that Initialized this action. Such
types of trigger that activates itself is called as ‘recursive trigger’. The DBMS executes such chains
of trigger.
Schema Refinement: It is the last step before considering physical design; it checks the tables
for redundancies and addresses the problems caused by redundancy based on decomposition.
Redundancy means storing the multiple copies of same data in the database, it leads to several
problems.
For example consider Hourly_Emps relation, ssn is the key for this relation, and hourly wages
attribute is determined by rating attribute. That is, for a given rating value, there is only one
permissible hourly_wages value. This integrity constraint is an example of functional
dependency. It leads to possible redundancy in the relation Hourly_Emps.
Redundant storage: the rating value 8 corresponds to the hourly wage 10, this association
repeated three times, the rating value 5 corresponds to the hourly wage 7, this association
repeated two times.
Deletion anomalies: it may not possible to delete certain information without losing some other,
unrelated, information as well.
For example if we delete all tuples with a given rating value we lose the association between that
rating value and its hourly_wage value.
Update anomalies: if one copy of such repeated data is updated, then database will be in
inconsistent state, unless all copies are similarly updated.
For example the hourly_wages in the first tuple (row) could be updated without making a similar
change in the second tuple.
Problems related to decomposition: decomposition relation schema can create more problems
than it solve. So we have to understand when we normally decompose a table into n number of
sub tables we have to realize that what the importance of doing decomposition is and problems
that might be we may if we do decomposition.
Example: Relation R(ABC) is decomposed into R1(AB) and R2(BC) check whether the
decomposition is lossy or loss less decomposition.
R(ABC)=
A B C
1 2 1
2 2 2
3 3 2
R1(AB)=
A B
1 2
2 2
3 3
R2(BC)=
B C
2 1
2 2
3 2
R1 X R2=
A B B C
1 2 2 1
1 2 2 2
1 2 3 2
2 2 2 1
2 2 2 2
2 2 3 2
3 3 2 1
3 3 2 2
3 3 3 2
R1 ⋈ R2 =
A B C
1 2 1
1 2 2
2 2 1
2 2 2
3 3 2
R1 ⋈ R2 != R, This decomposition is lossy decomposition.
Example: Relation R(ABC) is decomposed into R1(AB) and R2(AC) check whether the
decomposition is lossy or loss less decomposition.
R(ABC)=
A B C
1 1 1
2 1 2
3 2 1
4 3 2
R(ABC) is decomposed into R1(AB) and R2(AC)
R 1=
A B
1 1
2 1
3 2
4 3
R2=
A C
1 1
2 2
3 1
4 2
R1 ⋈ R2 = R
A B C
1 1 1
2 1 2
3 2 1
4 3 2
This decomposition is loss less.
The decomposition of relation schema R with set of FD’s, F into R 1 an R2 with FD’s F1 and F2
then this decomposition is said to be dependency preserving, the closure of set of functional
dependency set (F) is equals to the closure of the functional dependency sets F1 and F2.
Functional Dependency:
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table. Functional Dependency is
represented by (arrow sign).
If column X of a table uniquely identifies the column Y of same table then it can represented as
XY (Attribute Y is functionally dependent on attribute X, here attribute X determinant, Y is
dependent)
A functional dependency XY in a relation holds if two tuples having same value of attribute X
also have same value for attribute Y.
For Example, in relation STUDENT shown in table 1, Functional Dependencies
STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE hold
But
STUD_NAME->STUD_ADDR do not hold
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial de
pendencies too.
Example:
ID → Name,
Name → DOB
Functional Dependency Set: Functional Dependency set or FD set of a relation is the set of all
FDs present in the relation.
Closure of set of functional dependencies (F+): closure of set of functional dependencies F+, is
set of all FD’s that include F as well as all dependencies that can be inferred from F
Armstrong Axioms: The Armstrong axioms refer the set of inference rules. These are used to find
the closure of set of functional dependencies (F+), from the given set of functional dependencies
(F).
Example: Relation has attributes R(ABCGHI) , set of functional dependencies of the relation
are F={AB, AC, CGH,CGI, BH} find the closure of set of functional
dependencies for relation R.
Given set of functional dependencies (F) = {AB, AC, CGH, CGI, BH}
Here we have
AB, and we also have BH then we can write AH (Transitivity)
CGH, and CGI, then we can write CG HI (Union)
AC, CGH, then we can write AGH (Pseudo transitivity)
AC, CGI, then we can write AGI (Pseudo transitivity)
Closure of set of functional dependencies (F+)= {AB, AC, CGH, CGI, BH, AH ,
CG HI, AGH, AGI }
Example: Relation has attributes R(ABCDEF) , set of functional dependencies of the relation
are F={AB, AC, CDE,CDF, BE} find the closure of set of functional
dependencies for relation R.
Given set of functional dependencies (F) = F={AB, AC, CDE,CDF, BE}
Here we have
AB and AC then we can write ABC(union)
CDE and CDF then we can write CDEF(union)
ABand BE then we can write AE(Transitivity)
AC, and CDE then we can write ADE (Pseudo transitivity)
AC, and CDF then we can write ADF (Pseudo transitivity)
Then closure of set of functional dependencies (F+)= { AB, AC, CDE,CDF, BE,
ABC, CDEF, AE, ADF, ADE }
To find the minimal set of functional dependencies we should follow these three rules
Here attribute B is extraneous, because from attribute A we can get attribute B, but from
attribute B we can’t get attribute A.
To removing extraneous attribute B, the functional dependency AB C can be written as AC
After removing the extraneous attribute B the FD’s are
AB,
BC,
AC
AB,
BC
Example2: find the canonical cover for the given set of functional dependencies
G = {AC, ABC,CDI, CDI, ECAB, EIC,AE}.
ABC
To find the extraneous attribute we should find the closure of A and Closure of B
CLOSURE OF A {A+}={A,C,D,I,E,B}
CLOSURE OF B {B+}={B}
Here Attribute B Is Extranious.
AC
CDI
To find the extraneous attribute we should find the closure of C and Closure of D
To find the extraneous attribute we should find the closure of C and Closure of D
CLOSURE OF E {E+} = { E}
CLOSURE OF C {C+}= {C,D,I }
HERE THERE IS N EXTRANIOUS ATTRIBUTE
EIC
To find the extraneous attribute we should find the closure of E and Closure of I
CLOSURE OF E {E+} = { E}
CLOSURE OF C {I+}= {I}
Here there is noextranious attribute
ECB
To find the extraneous attribute we should find the closure of E and Closure of I
CLOSURE OF E {E+} = { E}
CLOSURE OF C {C+}= {C,D,I}
HERE THERE IS N EXTRANIOUS ATTRIBUTE
Example2: find the canonical cover for the given set of functional dependencies
F= {AD,EAD,BCAD,CB}
Example3: Find the canonical cover of F = { A BC, B CE, A E, AC H, D B}
Normalization
Normalization is the process of organizing the data in the database. It is used to minimize the
redundancy from a relation or set of relations. Normalization divides the larger table into the
smaller table and links them using relationship.
Types of Normal Forms
First Normal Form (1NF):
A relation will be 1NF if it contains an atomic value. It states that an attribute of a table cannot
hold multiple values. It must hold only single-valued attribute, it doesn’t allows the multi-valued
attribute, composite attribute, and their combinations.
The decomposition of the STUDENTT table into 1NF has been shown below:
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee.
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE , i.e., COURSE_FEE is dependent on COURSE_NO,
which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on
a proper subset of the candidate key, which is a partial dependency and so this relation is not in
2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO and Table 2: COURSE_NO, COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C4 2000
4 C3 C3 1000
4 C1 C1 1000
2 C5 C5 2000
Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are
100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead
once we can store it in the second table as the course fee for C1 is 1000.
Example 2: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject.
TEACHER table
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_SUBJECT table:
Teacher ID Subject
25 Chemistry
25 Biol0gy
47 English
83 Math
83 Computer
A relation is in 3NF if at least one of the following condition holds in every non-trivial function
dependency X –> Y
1. X is a super key.
2. Y is a prime attribute i.e., each element of Y is part of some candidate key.
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
Example 1 – In relation STUDENT
STUD_
STUD_ID STUD _ STATE STUD_ COUNTRY STUD_ AGE
NAME
1 RAM HAYANA INDIA 20
2 RAM PUNJAB INDIA 19
3 SURESH PUNJAB INDIA 21
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 INDIA
364 UK
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
264 Designing
264 Testing
364 Stores
364 Developing
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT RELATION
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE, STUDENT_HOBBY.
Student_Course
STU_ID COURSE
21 COMPUTER
21 MATH
34 CHEMESTRY
74 BIALOGY
59 PHYSICS
Student_Hobby
STU_ID HOBBY
21 DANCING
21 SINGING
34 DANCING
74 CRICKET
59 HOCKEY
Fifth normal form (5NF):
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF)
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
Table P1
SEMESTER SUBJECT
Semester 1 COMPUTER
Semester 1 MATH
Semester 2 MATH
Semester 1 CHEMESTRY
Table P2
SUBJECT LECTURER
COMPUTER Ansika
COMPUTER John
MATH John
MATH Akash
CHEMESTRY Praveen
Table P3
SEMESTER LECTURER
Semester 1 Ansika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Example: Relation R has attributes R(ABCDEF) , set of FD’s, {ABC, CD, CE, EF,
FA} check the highest normal for relation R.
2NF: L.H.S. of all FD’s should be candidate key or RHS is Prime attribute it should not contain
any partial dependency L.H.S. is proper subset of any candidate key and R.H.S is non prime
attribute)
It is not in 2NF because is C is proper subset of candidate key and D is Non prime attribute.
(CDE)
The highest normal for of given relation R is first normal form (1NF).
Example: check the highest normal form of the Student relation R, set of functional
dependencies F is {RollnoName, RollnoVoterid, Voteridage, Voter Rollno }.
To check the highest normal form, we can check from highest to lowest.
BCNF: L.H. S of every functional dependency is the candidate key or super key of a relation.
Candidate keys of a relation are {Roll no, Voter id}
The above table is in BCNF in L.H. S of every functional dependency is the candidate key.
Example: R (A, B, C) and set of FD’s F={ABC, CA} show that R is in 3NF, not in
The above relation is in 3NF because AB is candidate key in FD (AB C) and A is the prime
attribute in FD (CA) of Relation R.
The above relation is not in BCNF because attribute C is not candidate key in FD (CA) of
relation R
Example : Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D,
AC->BE, B->E}.
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of
relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the
relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E
is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be prime attribute.
So the highest normal form of relation will be 2nd Normal form.
Example: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.
ABC D, CD AE
1) It is always a good idea to start checking from BCNF, then 3 NF and so on.
2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form.
For example, ABC –> D is in BCNF (Note that ABC is a super key), so no need to check this
dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this dependency
is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already satisfied BCNF. Let us
consider CD -> AE. Since E is not a prime attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a candidate
key and it determine E, which is non-prime attribute. So, given relation is also not in 2 NF. So, the
highest normal form is 1 NF.