0% found this document useful (0 votes)
13 views12 pages

Unit 3 ADBMS

The document provides an overview of functional dependency and normalization in database management systems (DBMS), detailing types of functional dependencies, normalization processes, and various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF). It explains the importance of normalization in reducing data redundancy and preventing anomalies, while also discussing the advantages and disadvantages of normalization. Additionally, it touches on denormalization as an optimization technique to improve database performance after normalization.

Uploaded by

Rohit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

Unit 3 ADBMS

The document provides an overview of functional dependency and normalization in database management systems (DBMS), detailing types of functional dependencies, normalization processes, and various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF). It explains the importance of normalization in reducing data redundancy and preventing anomalies, while also discussing the advantages and disadvantages of normalization. Additionally, it touches on denormalization as an optimization technique to improve database performance after normalization.

Uploaded by

Rohit Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

RSR RUNGTA COLLEGE OF ENGINEERING AND TECHNOLOGY,

BHILAI

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Subject Notes Subject Name: ADBMS

Course/Semester: MCA-I

UNIT- III

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.

1. X → Y

The left side of FD is known as a determinant, the right side of the production is known as a
dependent.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.

Functional dependency can be written as:

1. Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency


1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional dependen
cy as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name ar
e trivial dependencies too.

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB

Normalization
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with redundant
data into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.

What is Normalization?

o Normalization is the process of organizing the data in the database.


o Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
o Normalization divides the larger table into smaller and links them using relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.

Data modification anomalies can be categorized into three types:

o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple
into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of
data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms apply
to individual relations. The relation is said to be in particular normal form if it satisfies
constraints.

Following are the various types of Normal forms:


Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining
should be lossless.

Advantages of Normalization

o Normalization helps to minimize data redundancy.


o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization

o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

Normal Forms in DBMS

Normalization is the process of minimizing redundancy from a relation or set of relations.


Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to
minimize the redundancy in relations. Normal forms are used to eliminate or reduce
redundancy in database tables.

1. First Normal Form –

If a relation contain composite or multi-valued attribute, it violates first normal form or a


relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is singled
valued attribute.
 Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-
valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in
table 2.

 Example 2 –

 ID Name Courses

 ------------------

 1 A c1, c2
 2 E c3
 3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

2. Second Normal Form –

To be in second normal form, a relation must be in first normal form and relation must not
contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate
key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime
attribute, it is called partial dependency.
 Example 1 – Consider table-3 as following below.
 STUD_NO COURSE_NO COURSE_FEE
 1 C1 1000
 2 C2 1500
 1 C4 2000
 4 C3 1000
 4 C1 1000
 2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one
only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime
attribute COURSE_FEE is dependent on a proper subset of the candidate key,
which is a partial dependency and so this relation is not in 2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we don’t need to store its
Fee as 1000 for all the 100 records, instead, once we can store it in the second
table as the course fee for C1 is 1000.
 Example 2 – Consider following functional dependencies in relation R (A, B ,
C, D )
 AB -> C [A and B together determine C]

BC -> D [B and C together determine D]


In the above relation, AB is the only candidate key and there is no partial
dependency, i.e., any proper subset of AB doesn’t determine any non-prime
attribute.

3. Third Normal Form –

A relation is in third normal form, if there is no transitive dependency for non-


prime attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-
trivial function dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is called
transitive dependency.
 Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is
transitively dependent on STUD_NO. It violates the third normal form.
To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
 Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All
attributes are on right sides of all functional dependencies are prime.

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form and for every FD,


LHS is super key. A relation is in BCNF iff in every non-trivial functional
dependency X –> Y, X is a super key.
 Example 1 – Find the highest normal form of a relation
R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its
subset can determine all attribute of relation, So AC will be
candidate key. A or C can’t be derived from any other attribute of
the relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attributes are those attributes that are part of
candidate key {A, C} in this example and others will be non-
prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS
does not allow multi-valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd
normal form (BC is not a proper subset of candidate key AC) and
AC->BE is in 2nd normal form (AC is candidate key) and B->E is
in 2nd normal form (B is not a proper subset of candidate key
AC).
The relation is not in 3rd normal form because in BC->D (neither
BC is a super key nor D is a prime attribute) and in B->E (neither
B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS
should be prime attribute.
So the highest normal form of relation will be 2nd Normal form.
 Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in BCNF.
Key Points –
3. BCNF is free from redundancy.
4. If a relation is in BCNF, then 3NF is also satisfied.
5. If all attributes of relation are prime attribute, then the relation is
always in 3NF.
6. A relation in a Relational Database is always and at least in 1NF
form.
7. Every Binary Relation ( a Relation with only 2 attributes ) is
always in BCNF.
8. If a Relation has only singleton candidate keys( i.e. every
candidate key consists of only 1 attribute), then the Relation is
always in 2NF( because no Partial functional dependency
possible).
9. Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
10. There are many more Normal forms that exist after BCNF, like
4NF and more. But in real world database systems it’s generally
not required to go beyond BCNF.

Exercise 1: Find the highest normal form in R (A, B, C, D, E) under


following functional dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF, and so
on.
2) If any functional dependency satisfied a normal form then there is no need
to check for lower normal form. For example, ABC –> D is in BCNF (Note
that ABC is a superkey), so no need to check this dependency for lower
normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super
key so this dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already
satisfied BCNF. Let us consider CD -> AE. Since E is not a prime attribute,
so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD is a proper
subset of a candidate key and it determines E, which is non-prime attribute.
So, given relation is also not in 2 NF. So, the highest normal form is 1 NF.
Denormalization in Databases

Denormalization is a database optimization technique in which we add redundant data to


one or more tables. This can help us avoid costly joins in a relational database. Note
that denormalization does not mean ‘reversing normalization’ or ‘not to normalize’. It is an
optimization technique that is applied after normalization.
Basically, The process of taking a normalized schema and making it non-normalized is
called denormalization, and designers use it to tune the performance of systems to support
time-critical operations.
In a traditional normalized database, we store data in separate logical tables and attempt to
minimize redundant data. We may strive to have only one copy of each piece of data in a
database.
For example, in a normalized database, we might have a Courses table and a Teachers
table. Each entry in Courses would store the teacherID for a Course but not the
teacherName. When we need to retrieve a list of all Courses with the Teacher’s name, we
would do a join between these two tables.
In some ways, this is great; if a teacher changes his or her name, we only have to update the
name in one place.
The drawback is that if tables are large, we may spend an unnecessarily long time doing
joins on tables.
Denormalization, then, strikes a different compromise. Under denormalization, we decide
that we’re okay with some redundancy and some extra effort to update the database in order
to get the efficiency advantages of fewer joins.
Pros of Denormalization:
1. Retrieving data is faster since we do fewer joins
2. Queries to retrieve can be simpler(and therefore less likely to have bugs),
since we need to look at fewer tables.
Cons of Denormalization:
1. Updates and inserts are more expensive.
2. Denormalization can make update and insert code harder to write.
3. Data may be inconsistent.
4. Data redundancy necessitates more storage.
In a system that demands scalability, like that of any major tech company, we almost
always use elements of both normalized and denormalized databases.

Multivalued Dependency

o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and


independent of each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".

Lossless Join and Dependency Preserving Decomposition


Decomposition of a relation is done when a relation in relational model is not in appropriate
normal form. Relation R is decomposed into two or more relations if decomposition is
lossless join as well as dependency preserving.
Lossless Join Decomposition
If we decompose a relation R into relations R1 and R2,
 Decomposition is lossy if R1 ⋈ R2 ⊃ R
 Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute
of R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC)
and R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) =
Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC)
because A->BC is given.
Dependency Preserving Decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R either must
be a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC)
and R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
GATE Question: Consider a schema R(A,B,C,D) and functional dependencies A->B
and C->D. Then the decomposition of R into R1(AB) and R2(CD) is [GATE-CS-2001]
A. dependency preserving and lossless join
B. lossless join but not dependency preserving
C. dependency preserving but not lossless join
D. not dependency preserving and not lossless join
Answer: For lossless join decomposition, these three conditions must hold true:
1. Att(R1) U Att(R2) = ABCD = Att(R)
2. Att(R1) ∩ Att(R2) = Φ, which violates the condition of lossless join
decomposition. Hence the decomposition is not lossless.
For dependency preserving decomposition,
A->B can be ensured in R1(AB) and C->D can be ensured in R2(CD). Hence it is
dependency preserving decomposition.

You might also like