Module 3 Functional Dependency and Normalization
Module 3 Functional Dependency and Normalization
1
Amity School of Engineering & Technology
Course Objectives:
1.The objective of this course is to get students familiar with Databases and their
use.
2.Case studies
3.Lab work
2
Amity School of Engineering & Technology
1. Understand the database fundamentals along with conceptual modeling to deal real
life applications.
2. Develop the ability to retrieve and manipulate information for business decision
making from databases
4. Understand the query processing techniques to automate the real time problems of
databases.
3
Amity School of Engineering & Technology
1.Functional Dependency
2.Types of Functional Dependency
3.Decomposition-Lossy and Loss less Join
4.Normalization using Functional Dependency
5.Normalization
6.Types of Normalization(1NF,2NF, 3NF,
BCNF,4NF,5NF)
7. Multi-valued dependency(4NF)
8.Join dependency (5NF)
4
Amity School of Engineering & Technology
Y is FD on X
X- Determinant
Y-Dependent
5
Example:
By this, we can say that the city, Employee Name, and salary are functionally
depended on Employee number.
Key terms
Key Terms Description
Axiom Axioms is a set of inference rules used to infer all the
functional dependencies on a relational database.
7
Amity School of Engineering & Technology
Below are the Three most important rules for Functional Dependency in Database:
2.Augmentation rule: When x -> y holds, and c is attribute set, then xc -> yc also
holds. That is adding attributes which do not change the basic dependencies.
3.Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -
> y holds and y -> z holds, then x -> z also holds. X -> y is called as functionally that
determines y.
8
Amity School of Engineering & Technology
There are mainly four types of Functional Dependency in DBMS. Following are the
types of Functional Dependencies in DBMS:
1.Multi-valued Dependency
4.Transitive Dependency
10
Amity School of Engineering & Technology
In this example, maf_year and color are independent of each other but dependent on car_model. In
this example, these two columns are said to be multivalue dependent on car_model.
This dependence can be represented like this:
car_model -> maf_year
car_model-> colour 11
Amity School of Engineering & Technology
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
13
Amity School of Engineering & Technology
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it's non-trivial functional
dependency.
14
Amity School of Engineering & Technology
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the company
name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of 15
three or more attributes.
Amity School of Engineering & Technology
5.It helps you to find the facts regarding the database design
16
Amity School of Engineering & Technology
FDs in R include
{stuId}→{lastName}, but not the reverse
{stuId} →{lastName, major, credits, status, socSecNo, stuId}
{socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
{credits}→{status}, but not {status}→{credits}
1.ZipCode→Address,City
16652 is Huntingdon’s ZIP
2.ArtistName→BirthYear
Picasso was born in 1881
3.Autobrand→Manufacturer, Engine type
Pontiac is built by General Motors with gasoline engine
4.Author, Title→PublDate
Shakespeare’s Hamlet was published in 1600
18
Amity School of Engineering & Technology
Pres->Spouse, BD,DD
Pres, State->Party
Pres, Party->State 19
Pres->VP
Amity School of Engineering & Technology
20
Amity School of Engineering & Technology
22
Amity School of Engineering & Technology
Decomposition
•Decomposition in DBMS removes redundancy, anomalies and inconsistencies from a
database by dividing the table into multiple tables. A functional decomposition is the
process of breaking down the functions of an organization into progressively greater
(finer and finer) levels of detail.
•The decomposition of a relation scheme R consists of replacing the relation schema
by two or more relation schemas that each contain a subset of the attributes of R and
together include all attributes in R.
•Decomposition helps in eliminating some of the problems of bad design such as
redundancy, inconsistencies and anomalies.
The following are the types −
1.Lossless Decomposition -Decomposition is lossless if it is feasible to reconstruct
relation R from decomposed tables using Joins. This is the preferred choice. The
information will not lose from the relation when decomposed. The join would result in
the same original relation.
2.Lossy Decomposition- As the name suggests, when a relation is decomposed into
two or more relational schemas, the loss of information is unavoidable when the
original relation is retrieved.
23
Amity School of Engineering & Technology
24
Amity School of Engineering & Technology
25
Amity School of Engineering & Technology
26
Amity School of Engineering & Technology
Consider that we have table STUDENT with three attribute roll_no , sname
and department.
STUDENT:
No_name: Name_dept :
In lossy decomposition ,spurious tuples are generated when a natural join is applied to
the relations in the decomposition.
stu_joined :
28
Amity School of Engineering & Technology
Stu_name: Stu_dept :
Now ,when these two relations are joined on the comman column 'roll_no' ,the
resultant relation will look like stu_joined.
stu_joined :
30
Amity School of Engineering & Technology
What is Normalization?
Normalization is a database design technique that reduces data redundancy and
eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies.
Normalization rules divides larger tables into smaller tables and links them using
relationships.
The inventor of the relational model Edgar Codd proposed the theory of
normalization of data with the introduction of the First Normal Form, and he
continued to extend theory with Second and Third Normal Form. Later he joined
Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
31
Amity School of Engineering & Technology
The Theory of Data Normalization in SQL server is still being developed further. For
example, there are discussions even on 6th Normal Form.
33
Amity School of Engineering & Technology
Here you see Movies Rented column has multiple values. Now let's move into 1st Normal
34
Forms:
Amity School of Engineering & Technology
Hence, we require both Full Name and Address to identify a record uniquely. That is
a composite key.
Let's move into second normal form 2NF
35
Amity School of Engineering & Technology
36
Amity School of Engineering & Technology
What is a KEY?
A KEY is a value used to identify a record in a table uniquely. A KEY could be a single
column or combination of multiple columns
Note: Columns in a table that are NOT used to identify a record uniquely are called non-
key columns.(Non key attributes)
We have divided our 1NF table into two tables viz. Table 1 and Table2. Table 1
contains member information. Table 2 contains information on movies rented.
We have introduced a new column called Membership_id which is the primary key for
table 1. Records can be uniquely identified in Table 1 using membership id
38
Amity School of Engineering & Technology
Foreign Key references the primary key of another Table! It helps connect your Tables
1.A foreign key can have a different name from its primary key
2.It ensures rows in one table have corresponding rows in another
3.Unlike the Primary key, they do not have to be unique. Most often they aren't
4.Foreign keys can be null even though primary keys can not
39
Amity School of Engineering & Technology
40
Amity School of Engineering & Technology
You will only be able to insert values into your foreign key that exist in the unique key
in the parent table. This helps in referential integrity.
The above problem can be overcome by declaring membership id from Table2 as
foreign key of membership id from Table1
Now, if somebody tries to insert a value in the membership id field that does not exist
41
in the parent table, an error will be shown!
Amity School of Engineering & Technology
Note – If A->B and B->C are two FDs then A->C is called transitive
dependency.
The normalization of 2NF relations to 3NF involves the removal of
transitive dependencies.
43
Example-1:
In relation STUDENT given in Table
FD set:
{STUD_NO -> STUD_NAME,
STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY,
STUD_NO -> STUD_AGE
STUD_NO-> STUD_COUNTRY}
Candidate Key:
{STUD_NO}
For this relation in given table , STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the
third normal form.
44
To convert it in third normal form,
45
Example 2:
Below is a student table that has student id, student name, subject id, subject
name, and address of the student as its columns.
This implies that the table possesses a transitive functional dependency, and it does
not fulfill the third normal form criteria.
Now to change the table to the third normal form, you need to divide the table as
shown below: 46
As you can see in both the tables,
•All the non-key attributes are now fully functional, dependent only on
the primary key.
•In the first table, columns name, subid, and addresses only depend
on stu_id.
•In the second table, the subject only depends on subid.
47
Amity School of Engineering & Technology
48
Amity School of Engineering & Technology
We have again divided our tables and created a new table which stores Salutations.
•There are no transitive functional dependencies, and hence our table is in 3NF.
•In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID is foreign to
primary key in Table 3.
49
Super Key
Super key is a set of over one key that can identify a record uniquely in a
table, and the Primary Key is a subset of Super Key.
50
Amity School of Engineering & Technology
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF. Boyce Codd Normal Form
is also known as 3.5 NF.
For BCNF, the table should be in 3NF, and every Right-Hand Side (RHS) attribute
of the functional dependencies should depend on the super key of that particular
table. i.e. LHS is super key.
Any table is said to be in BCNF, if its candidate keys do not have any partial
dependency on the other attributes.
i.e.; in any table with (x, y, z) columns, if (x, y)->z and z->x then it’s a
violation of BCNF. If (x, y) are keys and (x, y)->z, then there should not be
any reverse dependency, directly or partially. 51
The subject table follows these conditions:
In the above table, student_id and subject together form the primary key because
using student_id and subject; you can determine all the table columns.
Another important point to be noted here is that one professor teaches only one
subject, but one subject may have two professors.
Which exhibit there is a dependency between subject and professor, i.e. subject 52
depends on the professor's name.
Above table follows all the Normal forms except the Boyce Codd Normal Form.
BCNF does not follow in the table as a subject is a prime attribute, the professor is a
non-prime attribute.
To transform the table into the BCNF, you will divide the table into two parts. One table
will hold stuid which already exists and the second table will hold a newly created column
profid.
And in the second table will have the columns profid, subject, and professor,
which satisfies the BCNF. 53
Amity School of Engineering & Technology
(BCNF): Example 2
For BCNF, the table should be in 3NF, and every Right-Hand Side (RHS) attribute of
the functional dependencies should depend on the super key of that particular table. i.e.
LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
54
Amity School of Engineering & Technology
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone
are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
364 UK
55
Amity School of Engineering & Technology
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
264 283
264 300
364 232
364 549
56
BCNF Example and Assessment Questions
Professor(Prof_code,Dept,HOD,Percent_Time)
It is assumed that-
1.A professor can work in more than one department.
2.The percentage of the time he/she spends in each department is given
3.Each department has only one HOD.
Table 1
Department, Prof_code->Percent_time
Table 2
Department->HOD
57
Amity School of Engineering & Technology
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
59
Amity School of Engineering & Technology
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
60
Amity School of Engineering & Technology
Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2
& P3:
61
Amity School of Engineering & Technology
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
62
Amity School of Engineering & Technology
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
63
Amity School of Engineering & Technology
Two employees (Jon & Lester) are having two mobile numbers so the
company stored them in the same field as you can see in the table above.
64
This table is not in 1NF as the rule says “each attribute of a table must have
atomic (single) values”, the emp_mobile values for employees Jon & Lester
violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
104 Lester Bangalore 8123450987
65
Second normal form (2NF)
A table is said to be in 2NF if both the following conditions hold:
Table is in 1NF (First normal form)
No non-prime attribute is dependent on the proper subset of any candidate
key of table.
An attribute that is not part of any candidate key is known as non-prime
attribute.
Example: Suppose a school wants to store the data of teachers and the
subjects they teach. They create a table that looks like this: Since a teacher
can teach more than one subjects, the table can have multiple rows for a
same teacher.
68
emp_id emp_name emp_zip emp_state emp_city emp_district
69
Here, emp_state, emp_city & emp_district dependent on emp_zip. And,
emp_zip is dependent on emp_id that makes non-prime attributes (emp_state,
emp_city & emp_district) transitively dependent on super key (emp_id). This
violates the rule of 3NF
To make this table complies with 3NF we have to break the table into two
tables to remove the transitive dependency:
employee table:
70
employee_zip table:
71
Boyce Codd normal form (BCNF)
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is
stricter than 3NF. A table complies with BCNF if it is in 3NF and for every
functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than
one department. They store the data like this:
emp_nationalit dept_no_of_e
emp_id emp_dept dept_type
y mp
Production
1001 Austrian D001 200
and planning
72
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables
like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
73
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and
D001 200
planning
stores D001 250
design and technical
D134 100
support
Purchasing
D134 600
department
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
1002 design and technical support
1002 Purchasing department
74
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a
key.
75
S.NO 4NF 5NF
A relation in 4NF must also be in BCNF(Boyce A relation in 5NF must also be in 4NF(Fourth Normal
1.
Codd Normal Form). Form).
3. A relation in 4NF may or may not be in 5NF. A relation in 5NF is always in 4NF
Fourth Normal Form is less stronger in Fifth Normal form is more stronger than Fourth
4.
comparison to Fifth Normal form. Normal Form.
If a relation is in Fourth Normal Form then it will If a relation is in Fifth Normal Form then it will less
5.
have more redundancy. redundancy.
76
Thanks
77