0% found this document useful (0 votes)
17 views

Unit 3 Notes

Functional Dependency (FD) in a DBMS defines the relationship between attributes, helping maintain data quality and design integrity. It includes rules such as reflexive, augmentation, and transitivity, and types like multivalued, trivial, non-trivial, and transitive dependencies. Normalization is a process to organize data, reduce redundancy, and eliminate anomalies, with various normal forms ensuring minimal redundancy and optimal design.

Uploaded by

Diwakar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Unit 3 Notes

Functional Dependency (FD) in a DBMS defines the relationship between attributes, helping maintain data quality and design integrity. It includes rules such as reflexive, augmentation, and transitivity, and types like multivalued, trivial, non-trivial, and transitive dependencies. Normalization is a process to organize data, reduce redundancy, and eliminate anomalies, with various normal forms ensuring minimal redundancy and optimal design.

Uploaded by

Diwakar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Unit-3

What is a Functional Dependency?


Functional Dependency (FD) determines the relation of one attribute to another attribute
in a database management system (DBMS) system. Functional dependency helps you to
maintain the quality of data in the database. A functional dependency is denoted by an
arrow →. The functional dependency of X on Y is represented by X → Y. Functional
Dependency plays a vital role to find the difference between good and bad database
design.

Example:

Employee number Employee Name Salary City

1 Dana 50000 San Francisco

2 Francis 38000 London

3 Andrew 25000 Tokyo

In this example, if we know the value of Employee number, we can obtain Employee Name,
city, salary, etc. By this, we can say that the city, Employee Name, and salary are
functionally depended on Employee number.

Normalization should be part of the database design process. However, it is difficult to


separate the normalization process from the ER modelling process so the two techniques
should be used concurrently.

Use an entity relation diagram (ERD) to provide the big picture, or macro view, of an
organization’s data requirements and operations. This is created through an iterative process
that involves identifying relevant entities, their attributes and their relationships.

Normalization procedure focuses on characteristics of specific entities and represents the


micro view of entities within the ERD.

Rules of Functional Dependencies


Below given are the Three most important rules for Functional Dependency:

 Reflexive rule –. If X is a set of attributes and Y is_subset_of X, then X holds a value of


Y.
 Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc also holds.
That is adding attributes which do not change the basic dependencies.
 Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y
holds and y -> z holds, then x -> z also holds. X -> y is called as functionally that
determines y.

Types of Functional Dependencies


 Multivalued dependency:
 Trivial functional dependency:
 Non-trivial functional dependency:
 Transitive dependency:

Multivalued dependency in DBMS


Multivalued dependency occurs in the situation where there are multiple independent
multivalued attributes in a single table. A multivalued dependency is a complete constraint
between two sets of attributes in a relation. It requires that certain tuples be present in a
relation.

Example:

Car_model Maf_year Color

H001 2017 Metallic

H001 2017 Green

H005 2018 Metallic

H005 2018 Blue

H010 2015 Metallic


H033 2012 Gray

In this example, maf_year and color are independent of each other but dependent on
car_model. In this example, these two columns are said to be multivalue dependent on
car_model.

This dependence can be represented like this:

car_model -> maf_year

car_model-> colour

Trivial Functional dependency:


The Trivial dependency is a set of attributes which are called a trivial if the set of attributes
are included in that attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X.

For example:

Emp_id Emp_name

AS555 Harry

AS811 George

AS999 Kevin

Consider this table with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of


{Emp_id,Emp_name}.

Non trivial functional dependency in DBMS


Functional dependency which also known as a nontrivial dependency occurs when A->B
holds true where B is not a subset of A. In a relationship, if attribute B is not a subset of
attribute A, then it is considered as a non-trivial dependency.

Company CEO Age

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Apple Tim Cook 57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it's non-trivial functional dependency.

Transitive dependency:
A transitive is a type of functional dependency which happens when t is indirectly formed
by two functional dependencies.

Example:

Company CEO Age

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Alibaba Jack Ma 54

{Company} -> {CEO} (if we know the compay, we know its CEO's name)

{CEO } -> {Age} If we know the CEO, we know the Age

Therefore according to the rule of rule of transitive dependency:


{ Company} -> {Age} should hold, that makes sense because if we know the company name,
we can know his age.

Note: You need to remember that transitive dependency can only occur in a relation of
three or more attributes.

What is Normalization?
Normalization is a method of organizing the data in the database which helps you to avoid
data redundancy, insertion, update & deletion anomaly. It is a process of analyzing the
relation schemas based on their different functional dependencies and primary key.

Normalization is inherent to relational database theory. It may have the effect of


duplicating the same data within the database which may result in the creation of
additional tables.

Advantages of Functional Dependency


 Functional Dependency avoids data redundancy. Therefore same data do not repeat
at multiple locations in that database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs
 It helps you to find the facts regarding the database design

Summary
 Functional Dependency is when one attribute determines another attribute in a
DBMS system.
 Axiom, Decomposition, Dependent, Determinant, Union are key terms for functional
dependency
 Four types of functional dependency are 1) Multivalued 2) Trivial 3) Non-trivial 4)
Transitive
 Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table
 The Trivial dependency occurs when a set of attributes which are called a trivial if the
set of attributes are included in that attribute
 Nontrivial dependency occurs when A->B holds true where B is not a subset of A
 A transitive is a type of functional dependency which happens when it is indirectly
formed by two functional dependencies
 Normalization is a method of organizing the data in the database which helps you to
avoid data redundancy
What Is Normalization?
Normalization is the branch of relational theory that provides design insights. It is the process
of determining how much redundancy exists in a table. The goals of normalization are to:

 Be able to characterize the level of redundancy in a relational schema


 Provide mechanisms for transforming schemas in order to remove redundancy

Normalization theory draws heavily on the theory of functional dependencies. Normalization


theory defines six normal forms (NF). Each normal form involves a set of dependency
properties that a schema must satisfy and each normal form gives guarantees about the
presence and/or absence of update anomalies. This means that higher normal forms have less
redundancy, and as a result, fewer update problems.

NORMALIZATION may also be defined as:- it is a database design technique that reduces
data redundancy and eliminates undesirable characteristics like Insertion, Update and
Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links
them using relationships. The purpose of Normalization in SQL is to eliminate redundant
(repetitive) data and ensure data is stored logically.

The inventor of the relational model Edgar Codd proposed the theory of normalization with
the introduction of the First Normal Form, and he continued to extend theory with Second
and Third Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-
Codd Normal Form.

Normal Forms
All the tables in any database can be in one of the normal forms we will discuss next. Ideally
we only want minimal redundancy for PK to FK. Everything else should be derived from other
tables. There are five normal forms.

Here is a list of Normal Forms

 1NF (First Normal Form)


 2NF (Second Normal Form)
 3NF (Third Normal Form)
 BCNF (Boyce-Codd Normal Form)
 4NF (Fourth Normal Form)
 5NF (Fifth Normal Form)

First Normal Form (1NF)


In the first normal form, only single values are permitted at the intersection of each row and
column; hence, there are no repeating groups.

To normalize a relation that contains a repeating group, remove the repeating group and form
two new relations.

The PK of the new relation is a combination of the PK of the original relation plus an attribute
from the newly created relation for unique identification.

Process for 1NF


We will use the Student_Grade_Report table below, from a School database, as our example
to explain the process for 1NF.
Student_Grade_Report (StudentNo, StudentName, Major, CourseNo, CourseName,
InstructorNo, InstructorName, InstructorLocation, Grade)

 In the Student Grade Report table, the repeating group is the course information. A student can
take many courses.
 Remove the repeating group. In this case, it’s the course information for each student.
 Identify the PK for your new table.
 The PK must uniquely identify the attribute value (StudentNo and CourseNo).
 After removing all the attributes related to the course and student, you are left with the student
course table (StudentCourse).
 The Student table (Student) is now in first normal form with the repeating group removed.
 The two new tables are shown below.

Student (StudentNo, StudentName, Major)


StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName,
InstructorLocation, Grade)

How to update 1NF anomalies


StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName,
InstructorLocation, Grade)

 To add a new course, we need a student.


 When course information needs to be updated, we may have inconsistencies.
 To delete a student, we might also delete critical information about a course.
Second Normal Form (2NF)
For the second normal form, the relation must first be in 1NF. The relation is automatically in
2NF if, and only if, the PK comprises a single attribute.

If the relation has a composite PK, then each non-key attribute must be fully dependent on the
entire PK and not on a subset of the PK (i.e., there must be no partial dependency or
augmentation).

Process for 2NF


To move to 2NF, a table must first be in 1NF.

 The Student table is already in 2NF because it has a single-column PK.


 When examining the Student Course table, we see that not all the attributes are fully
dependent on the PK; specifically, all course information. The only attribute that is fully
dependent is grade.
 Identify the new table that contains the course information.
 Identify the PK for the new table.
 The three new tables are shown below.

Student (StudentNo, StudentName, Major)


CourseGrade (StudentNo, CourseNo, Grade)
CourseInstructor (CourseNo, CourseName, InstructorNo, InstructorName,
InstructorLocation)

How to update 2NF anomalies


 When adding a new instructor, we need a course.
 Updating course information could lead to inconsistencies for instructor information.
 Deleting a course may also delete instructor information.

Third Normal Form (3NF)


To be in third normal form, the relation must be in second normal form. Also all transitive
dependencies must be removed; a non-key attribute may not be functionally dependent on
another non-key attribute.

Process for 3NF


 Eliminate all dependent attributes in transitive relationship(s) from each of the tables that have
a transitive relationship.
 Create new table(s) with removed dependency.
 Check new table(s) as well as table(s) modified to make sure that each table has a determinant
and that no table contains inappropriate dependencies.
 See the four new tables below.

Student (StudentNo, StudentName, Major)


CourseGrade (StudentNo, CourseNo, Grade)
Course (CourseNo, CourseName, InstructorNo)
Instructor (InstructorNo, InstructorName, InstructorLocation)

At this stage, there should be no anomalies in third normal form. Let’s look at the dependency
diagram (Figure 12.1) for this example. The first step is to remove repeating groups, as
discussed above.

Student (StudentNo, StudentName, Major)

StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo, InstructorName,


InstructorLocation, Grade)

To recap the normalization process for the School database, review the dependencies shown in
Figure 12.1.

Figure 12.1 Dependency diagram, by A. Watt.

The abbreviations used in Figure 12.1 are as follows:

 PD: partial dependency


 TD: transitive dependency
 FD: full dependency (Note: FD typically stands for functional dependency. Using FD as an
abbreviation for full dependency is only used in Figure 12.1.)

Boyce-Codd Normal Form (BCNF)


When a table has more than one candidate key, anomalies may result even though the relation
is in 3NF. Boyce-Codd normal form is a special case of 3NF. A relation is in BCNF if, and
only if, every determinant is a candidate key.

BCNF Example 1
Consider the following table (St_Maj_Adv).

Student_id Major Advisor

111 Physics Smith

111 Music Chan

320 Math Dobbs

671 Physics White

803 Physics Smith

The semantic rules (business rules applied to the database) for this table are:

1. Each Student may major in several subjects.


2. For each Major, a given Student has only one Advisor.
3. Each Major has several Advisors.
4. Each Advisor advises only one Major.
5. Each Advisor advises several Students in one Major.

The functional dependencies for this table are listed below. The first one is a candidate key;
the second is not.

1. Student_id, Major ——> Advisor


2. Advisor ——> Major

Anomalies for this table include:

1. Delete – student deletes advisor info


2. Insert – a new advisor needs a student
3. Update – inconsistencies
Note: No single attribute is a candidate key.

PK can be Student_id, Major or Student_id, Advisor.

To reduce the St_Maj_Adv relation to BCNF, you create two new tables:

1. St_Adv (Student_id, Advisor)


2. Adv_Maj (Advisor, Major)

St_Adv table

Student_id Advisor

111 Smith

111 Chan

320 Dobbs

671 White

803 Smith

Adv_Maj table

Advisor Major

Smith Physics
Chan Music

Dobbs Math

White Physics

BCNF Example 2
Consider the following table (Client_Interview).

ClientNo InterviewDate InterviewTime StaffNo RoomNo

CR76 13-May-02 10.30 SG5 G101

CR56 13-May-02 12.00 SG5 G101

CR74 13-May-02 12.00 SG37 G102

CR56 1-July-02 10.30 SG5 G102

FD1 – ClientNo, InterviewDate –> InterviewTime, StaffNo, RoomNo (PK)

FD2 – staffNo, interviewDate, interviewTime –> clientNO (candidate key: CK)

FD3 – roomNo, interviewDate, interviewTime –> staffNo, clientNo (CK)

FD4 – staffNo, interviewDate –> roomNo

A relation is in BCNF if, and only if, every determinant is a candidate key. We need to create
a table that incorporates the first three FDs (Client_Interview2 table) and another table
(StaffRoom table) for the fourth FD.

Client_Interview2 table

ClientNo InterviewDate InterViewTime StaffNo

CR76 13-May-02 10.30 SG5

CR56 13-May-02 12.00 SG5

CR74 13-May-02 12.00 SG37

CR56 1-July-02 10.30 SG5

StaffRoom table

StaffNo InterviewDate RoomNo

SG5 13-May-02 G101

SG37 13-May-02 G102

SG5 1-July-02 G102

Join Dependency:
A Join dependency is generalization of Multivalued dependency.A JD {R1, R2, ..., Rn}
is said to hold over a relation R if R1, R2, R3, ..., Rn is a lossless-join decomposition
of R . There is no set of sound and complete inference rules for JD.
Inclusion Dependency:
An Inclusion Dependency is a statement of the form that some columns of a relation
are contained in other columns. A foreign key constraint is an example of inclusion
dependency.

Lossless Join and Dependency Preserving Decomposition


Last Updated: 28-05-2017
Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is lossless
join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
 Decomposition is lossy if R1 ⋈ R2 ⊃ R
 Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either
in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is
given.
Dependency Preserving Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a
part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).

Alternative approaches to database design


1. Bottom Up Approach: This approach builds relations on the basis of the
relationships existing among individual attributes. This is not so commonly used
as collecting a large number of attributes initially can be a very complex task.
This approach is also known as Design by Synthesis.
2. Top Down Approach: This approach is known as Design by Analysis as it
begins with certain relations and then after some analysis , various rules and
methods are applied until all the desirable properties are met.
Normalization and Database Design
During the normalization process of database design, make sure that proposed entities meet
required normal form before table structures are created. Many real-world databases have been
improperly designed or burdened with anomalies if improperly modified during the course of
time. You may be asked to redesign and modify existing databases. This can be a large
undertaking if the tables are not properly normalized.

Key Terms and Abbrevations

Boyce-Codd normal form (BCNF): a special case of 3rd NF

first normal form (1NF): only single values are permitted at the intersection of each row and
column so there are no repeating groups

normalization: the process of determining how much redundancy exists in a table

second normal form (2NF): the relation must be in 1NF and the PK comprises a single
attribute

semantic rules: business rules applied to the database

third normal form (3NF): the relation must be in 2NF and all transitive dependencies must be
removed; a non-key attribute may not be functionally dependent on another non-key attribute
4NF (Fourth Normal Form) and 5NF (Fifth Normal Form)
If two or more independent relation are kept in a single relation or we can say multivalue
dependency occurs when the presence of one or more rows in a table implies the presence of
one or more other rows in that same table. Put another way, two attributes (or columns) in a
table are independent of one another, but both depend on a third attribute. A multivalued
dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.
For a dependency A -> B, if for a single value of A, multiple value of B exists, then the table may
have multi-valued dependency. The table should have at least 3 attributes and B and C should
be independent for A ->> B multivalued dependency. For example,

PERSON MOBILE FOOD_LIKES

Mahesh 9893/9424 Burger / pizza


PERSON MOBILE FOOD_LIKES

Ramesh 9191 Pizza

Person->-> mobile,
Person ->-> food_likes
This is read as “person multidetermines mobile” and “person multidetermines food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a functional
dependency X -> Y, every x determines exactly one y, never more than one.

Fourth normal form (4NF):

Fourth normal form (4NF) is a level of database normalization where there are no non-trivial
multivalued dependencies other than a candidate key. It builds on the first three normal forms
(1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states that, in addition to a
database meeting the requirements of BCNF, it must not contain more than one multivalued
dependency.
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. the table should not have any Multi-valued Dependency.
A table with a multivalued dependency violates the normalization standard of Fourth Normal
Form (4NK) because it creates unnecessary redundancies and can contribute to inconsistent
data. To bring this up to 4NF, it is necessary to break this information into two tables.
Example – Consider the database table of a class whaich has two relations R1 contains student
ID(SID) and student name (SNAME) and R2 contains course id(CID) and course name
(CNAME).

Table – R1(SID, SNAME)


SID SNAME

S1 A

S2 B

Table – R2(CID, CNAME)


Volume 0%

CID CNAME

C1 C
CID CNAME

C2 D

When there cross product is done it resulted in multivalued dependencies:

Table – R1 X R2
SID SNAME CID CNAME

S1 A C1 C

S1 A C2 D

S2 B C1 C

S2 B C2 D

Multivalued dependencies (MVD) are:


SID->->CID; SID->->CNAME; SNAME->->CNAME
Joint dependency – Join decomposition is a further generalization of Multivalued
dependencies. If the join of R1 and R2 over C is equal to relation R then we can say that a join
dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of a
given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R. A JD
⋈ {R1, R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless-join
decomposition. The *(A, B, C, D), (C, D) will be a JD of R if the join of join’s attribute is equal to
the relation R. Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a
JD of R.
Let R is a relation schema R1, R2, R3……..Rn be the decomposition of R. r( R ) is said to satisfy
join dependency if and only if

Example –

Table – R1
COMPANY PRODUCT

C1 pendrive

C1 mic

C2 speaker
COMPANY PRODUCT

C2 speaker

Company->->Product

Table – R2

AGENT COMPANY

Aman C1

Aman C2

Mohan C1

Agent->->Company

Table – R3
AGENT PRODUCT

Aman pendrive

Aman mic

Aman speaker

Mohan speaker

Agent->->Product

Table – R1⋈R2⋈R3
COMPANY PRODUCT AGENT

C1 pendrive Aman

C1 mic Aman

C2 speaker speaker
COMPANY PRODUCT AGENT

C1 speaker Aman

Agent->->Product

Fifth Normal Form / Projected Normal Form (5NF):

A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys
of R. A relation decomposed into two relations must have loss-less join Property, which ensures
that no spurious or extra tuples are generated, when relations are reunited through a natural
join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency)
Example – Consider the above schema, with a case as “if a company makes a product and an
agent is an agent for that company, then he always sells that product for the company”. Under
these circumstances, the ACP table is shown as:

Table – ACP
AGENT COMPANY PRODUCT

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut

The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three
relations will be shown as:

Table – R1
AGENT COMPANY

A1 PQR
AGENT COMPANY

A1 XYZ

A2 PQR

Table – R2
AGENT PRODUCT

A1 Nut

A1 Bolt

A2 Nut

Table – R3
COMPANY PRODUCT

PQR Nut

PQR Bolt

XYZ Nut

XYZ Bolt

Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 and R2 over
‘Agent’and ‘Product’ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a
lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the property
of lossless join.

Summary
 Database designing is critical to the successful implementation of a database
management system that meets the data requirements of an enterprise system.
 Normalization in DBMS helps produce database systems that are cost-effective and
have better security models.
 Functional dependencies are a very important component of the normalize data
process
 Most database systems are normalized database up to the third normal forms.
 A primary key uniquely identifies are record in a Table and cannot be null
 A foreign key helps connect table and references a primary key

You might also like