Functional Dependencies and Normalization For Relational Databases

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Unit 4

Functional Dependencies and Normalization for Relational Databases

Definition of Relational Model schema (relational schema):

 Relation schema defines the design and structure of the relation like it
consists of the relation name, set of attributes/field names/column names.
every attribute would have an associated domain.

Design guidelines for Relational schemas (Informal Design Guidelines for


Relational Schemas)

1. Semantics of the relational attributes


2. Reducing the redundant values in tuples.
3. Minimising the null values in tuples
4. Avoiding the generation of spurious(fake) tuples, if any

1. Semantics of the relational attributes

 Whenever we group attributes to form relation,


 We assume certain meaning associated with attributes.
 This meaning we call “SEMANTICS”.
 It tells how to interpret the values stored.
 Easier the semantics, better the relation schema would be
It means that

Whenever we are going to form relational schema there should be some meaning
among the attributes. This meaning is called semantics. This semantics relates one
attribute to another with some relation

Example:

Student name relates to the USNNO

Design guideline1:

 Design a relation schema that is easy to understand and explain its meaning
clearly.
 For which, do not combine various attributes drawn from different entity types
and relationship types into single relation.
2. Reducing the redundant values in tuples.

 Mixing attributes of multiple entities may cause problems


 Information is stored redundantly wasting storage
 Problems with update anomalies
 Insertion anomalies
 Deletion anomalies
 Modification anomalies

 Here whenever if we insert the tuples there may be ‘N’ students in one
department, so Dept No,Dept Name values are repeated ‘N’ times which
leads to data redundancy.
 Another problem is update anomalies ie if we insert new dept that has no
students.
 If we delete the last student of a dept, then whole information about that
department will be deleted

 If we change the value of one of the attributes of a particular table the we


must update the tuples of all the students belonging to that dept else
Database will become inconsistent.
 Note: Design in such a way that no insertion ,deletion, modification
anomalies will occur.

3. Reducing Null values in Tuples.


NULL values appear if...
 Attribute does not apply to the tuple.
 The attribute value for this tuple is unknown.
 The value is known, but not recorded
 Having too many attributes with lot of NULL values results in waste of storage,
and also leads to problems in JOIN operations and aggregate functions.

GUIDELINE
 As far as possible, avoid placing attributes in the base relation whose values
frequently are null. If nulls are unavoidable they should be applied only to
exceptional cases and not to majority of tuples.

1. Avoiding the generation of spurious(fake) tuples, if any

 Decomposing a relation should be based on primary keys. split based on non-


primary key results in generation of spurious tuples or incorrect information.
GUIDELINE
 Design relation schemas so that they can be joined with equality conditions on
primary key/foreign keys, which guarantee no spurious tuples to be generated.
Do not have relations that contain matching attributes other than PK/FK
combinations.
 If such relations are unavoidable do not join such relations.

Functional Dependency

 Functional Dependency (FD) determines the relation of one attribute to another


attribute in a database management system (DBMS) system.
 Functional dependency helps you to maintain the quality of data in the database.
A functional dependency is denoted by an arrow →. The functional dependency
of X on Y is represented by X → Y.
 Functional Dependency plays a vital role to find the difference between good and
bad database design.

Or
Functional Dependency is nothing but relationship that exist, when one attribute
uniquely determines another attribute.

Example

The following is an example that would make it easier to understand functional


dependency −
We have a <Department> table with two attributes − DeptId and DeptName.

DeptId = Department ID
DeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies


the DeptName attribute. This is because if you want to know the department name,
then at first you need to have the DeptId.

DeptId DeptName

001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can be
determined as DeptId is functionally dependent on DeptName −

DeptId -> DeptName


Advantages of Functional Dependency

 Functional Dependency avoids data redundancy.


 Therefore same data do not repeat at multiple locations in that database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs
 It helps you to find the facts regarding the database design.

Normalization
Defining Normalization:
It is the process of analysing the given set of relation schemas based on their Functional
Dependencies and primary keys to achieve desirable properties like
1. Minimizing redundancy
2. Minimizing insertion, deletion and updating anomalies

Categories of Normal Forms

1. First Normal Form (1NF)


For a table to be in the First Normal Form, it should follow the following 4 rules:
 It should only have single(atomic) valued attributes/columns.
 Values stored in a column should be of the same domain
 All the columns in a table should have unique names.
 And the order in which data is stored, does not matter.


 A relation is in 1NF if every attribute is a single-valued attribute or it does not
contain any multi-valued or composite attribute, i.e., every attribute is an atomic
attribute. If there is a composite or multi-valued attribute, it violates the 1NF.
 To solve this, we can create a new row for each of the values of the multi-valued
attribute to convert the table into the 1NF.

Example:
Let’s take an example of a relational table <EmployeeDetail> that contains the
details of the employees of the company
 Here, the Employee Phone Number is a multi-valued attribute. So, this relation is
not in 1NF.

Solution:

 To convert this table into 1NF, we make new rows with each Employee Phone
Number as a new row as shown below:

Second Normal Form (2NF)


For a relational table to be in second normal form, it must satisfy the following rules:

1. The table must be in first normal form.


2. It must not contain any partial dependency, i.e., all non-prime attributes are fully
functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.

Let us take an example of the following <EmployeeProjectDetail> table to understand


what is partial dependency and how to normalize the table to the second normal form:

In the above table, the prime attributes of the table are Employee Code and Project ID.
We have partial dependencies in this table because Employee Name can be determined
by Employee Code and Project Name can be determined by Project ID. Thus, the above
relational table violates the rule of 2NF.

Solution

To remove partial dependencies from this table and normalize it into second normal
form, we can decompose the <EmployeeProjectDetail> table into the following three
tables:
Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it
into <EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see,
the above tables satisfy the following two rules of 2NF as they are in 1NF and every non-
prime attribute is fully dependent on the primary key.

Third Normal Form (3NF)

The normalization of 2NF relations to 3NF involves the elimination of transitive


dependencies.

A functional dependency X -> Z is said to be transitive if the following three functional


dependencies hold:

 X -> Y
 Y does not -> X
 Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following conditions
hold:

 X is a super key of the table.


 Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively
dependent attributes and place them to a new table along with a copy of the
determinant.

Let us take an example of the following <EmployeeDetail> table to understand what is


transitive dependency and how to normalize the table to the third normal form:

The above table is not in 3NF because it has Employee Code -> Employee City transitive
dependency because:

 Employee Code -> Employee Zipcode


 Employee Zipcode -> Employee City

Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.

To remove transitive dependency from this table and normalize it into the third normal
form, we can decompose the <EmployeeDetail> table into the following two tables:
Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it into
<EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and they don’t
have any transitive dependency.

Boyce–Codd Normal Form (BCNF)

Boyce-Codd Normal Form is an advanced version of 3NF as it contains additional


constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following
rules:

1. The table must be in the third normal form.


2. For every non-trivial functional dependency X -> Y, X is the superkey of the table.
That means X cannot be a non-prime attribute if Y is a prime attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in a
database table.
Let us take an example of the following <EmployeeProjectLead> table to understand
how to normalize the table to the BCNF:

The above table satisfies all the normal forms till 3NF, but it violates
the rules of BCNF because the candidate key of the above table is
{Employee Code, Project ID}. For the non-trivial functional
dependency, Project Leader -> Project ID, Project ID is a prime
attribute but Project Leader is a non-prime attribute. This is not
allowed in BCNF.

To convert the given table into BCNF, we decompose it into two tables:
Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by
decomposing it into <EmployeeProject> and <ProjectLead> tables.

You might also like